1 Introduction 



There are several good reasons you might want to read about uniform spanning trees, one 
being that spanning trees are useful combinatorial objects. Not only are they fundamental 
in algebraic graph theory and combinatorial geometry, but they predate both of these 
subjects, having been used by Kirchoff in the study of resistor networks. This article 
addresses the question about spanning trees most natural to anyone in probability theory, 
namely what does a typical spanning tree look like? 

Some readers will be happy to know that understanding the basic questions requires 
no background knowledge or technical expertise. While the model is elementary, the 
answers are surprisingly rich. The combination of a simple question and a structurally 
complex answer is sometimes taken to be the quintessential mathematical experience. 
This nonwithstanding, I think the best reason to set out on a mathematical odyssey is 
to enjoy the ride. Answering the basic questions about spanning trees depends on a 
sort of vertical integration of techniques and results from diverse realms of probability 
theory and discrete mathematics. Some of the topics encountered en route are random 
walks, resistor networks, discrete harmonic analysis, stationary Markov chains, circulant 
matrices, inclusion-exclusion, branching processes and the method of moments. Also 
touched on are characters of abelian groups, entropy and the infamous incipient infinite 
cluster. 

The introductory section defines the model and previews some of the connections 
to these other topics. The remaining sections develop these at length. Explanations of 
jargon and results borrowed from other fields are provided whenever possible. Complete 
proofs are given in most cases, as appropriate. 
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1.1 Defining the model 



Begin with a finite graph G. That means a finite collection V(G) of vertices along with 
a finite collection E(G) of edges. Each edge either connects two vertices v and u> G V(G) 
or else is a self-edge, connecting some i> G V(Cr) to itself. There may be more than one 
edge connecting a pair of vertices. Edges are said to be incident to the vertices they 
connect. To make the notation less cumbersome we will write v G G and e G G instead 
of v G V(G) and e G E{G). For v,w E G say t> is a neighbor of u>, written i> ~ w if and 
only if some edge connects v and u>. Here is an example of a graph G\ which will serve 
often as an illustration. 



A ei B 




u e 3 ' figure 1 

Its vertex set is {A, B, C, D, E} and it has six edges e±, . . . , e^, none of which is a self-edge. 

A subgraph of a graph G will mean a graph with the same vertex set but only a 
subset of the edges. (This differs from standard usage which allows the vertex set to be 
a subset as well.) Since G\ has 6 edges, there are 2 6 = 64 possible different subgraphs of 
G\. A subgraph H C G is said to be a forest if there are no cycles, i.e. you cannot find 
a sequence of vertices vi, . . . ,Vk for which there are edges in H connecting Vi to Vi+i for 
each % < k and an edge connecting to v\. In particular (k = 1) there are no self-edges 
in a forest. A tree is a forest that is connected, i.e. for any v and w there is a path 
of edges that connects them. The components of a graph are the maximal connected 
subgraphs, so for example the components of a forest are trees. A spanning forest is a 
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forest in which every vertex has at least one incident edge; a spanning tree is a tree in 
which every vertex has at least one incident edge. If G is connected (and all our graphs 
will be) then a spanning tree is just a subgraph with no cycles such that the addition 
of any other edge would create a cycle. From this it is easy to see that every connected 
graph has at least one spanning tree. 

Now if G is any finite connected graph, imagine listing all of its spanning trees (there 
are only finitely many) and then choosing one of them at random with an equal probabil- 
ity of choosing any one. Call this random choice T and say that T is a uniform random 
spanning tree for G. In the above example there are eleven spanning trees for G\ given 
(in the obvious notation) as follows: 

eie 2 e 3 e 4 eie 2 e 3 e 5 eie 2 e 4 e 5 eie 3 e 4 e 5 

e2e3e 4 e 5 eie 2 e 4 e 6 eie 3 e 4 e 6 e 2 e 3 e 4 e 6 

eie 2 e 5 e 6 eie 3 e 5 e 6 e 2 e 3 e 5 e 6 

In this case, T is just one of these eleven trees, picked with uniform probability. The 
model is so simple, you may wonder what there is to say about it! One answer is that the 
model has some properties that are easy to state but hard to prove; these are introduced 
in the coming subsections. Another answer is that the definition of a uniform random 
spanning tree does not give us a way of readily computing local characteristics of the 
random tree. To phrase this as a question: can you compute probabilities of events local 
to a small set of edges, such as P(e 4 G T) or P(ei, e 4 G T) without actually enumerating 
all of the spanning trees of G? In a sense, most of the article is devoted to answering 
this question. (Events such as e\ being in the tree are called local in contrast to a global 
event such as the tree having diameter - longest path between two vertices - at most 
three.) 
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1.2 Uniform spanning trees have negative correlations 



Continuing the example in figure 1, suppose I calculate the probability that e\ G T. 
That's easy: there are 8 spanning trees containing ei, so 

P(ei GT) = i 

Similarly there are 7 spanning trees containing e 4 so 

P(e 4 6T) = ^. 

There are only 4 spanning trees containing both e\ and e 4 , so 

4 

P(ei G T and e 4 G T) = — . 

Compare the probability of both of these edges being in the tree with the product of the 
probabilities of each of the edges being in the tree: 

8 7 , 4 

= 86/ m > 



Thus 



,->/ ™, r^x P(ei G T and e 4 G T) „ m . 
P( ei G T | e 4 G T) = 1 p — 4 < P( ei G T) 



or in words, the conditional probability of e\ being in the tree if you know that e 4 is in 
the tree is less than the original unconditional probability. This negative correlation of 
edges holds in general, with the inequality not necessarily strict. 

Theorem 1.1 For any finite connected graph G, let T be a uniform spanning tree. If e 
and f are distinct edges, then P(e, / G T) < P(e G T)P(/ G T). 

Any spanning tree of an n-vertx graph contains n — 1 edges, so it should seem intu- 
itively plausible - even obvious - that if one edge is forced to be in the tree then any other 
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edge is less likely to be needed. Two proofs will be given later, but neither is straightfor- 
ward, and in fact the only proofs I know involve elaborate connections between spanning 
trees, random walks and electrical networks. Sections 2 and 3 will be occupied with the 
elucidation of these connections. The connection between random walks and electrical 
networks will be given more briefly, since an excellent treatment is available [8]. 

As an indication that the previous theorem is not trivial, here is a slightly stronger 
statement, the truth or falsity of which is unknown. Think of the distribution of T as 
a probability distribution on the outcome space f2 consisting of all the 2'^^' subgraphs 
of G that just happens to give probability zero to any subgraph that is not a spanning 
tree. An event A (i.e. any subset of the outcome space) is called an up-event - short 
for upwardly closed - if whenever a subgraph H of G has a further subgraph K and 
K £ A, then H G A. An example of an up-event is the event of containing at least 
two of the three edges ei,e 3 and e 5 . Say an event A ignores an edge e if for every H, 
HeA-^HUeeA. 

Conjecture 1 For any finite connected graph G, let T be a uniform spanning tree. Let 
e be any edge and A be any up-event that ignores e. Then 

P(A and e G T) < P(A)P(e G T). 

Theorem 1.1 is a special case of this when A is the event of / being in the tree. The 
conjecture is known to be true for series-parallel graphs and it is also know to be true 
in the case when A is an elementary cylinder event, i.e. the event of containing some 
fixed ei, . . . , efc. On the negative side, there are natural generalizations of graphs and 
spanning trees, namely matroids and bases (see [19] for definitions), and both Theorem 1.1 
and Conjecture 1 fail to generalize to this setting. If you're interested in seeing the 
counterexample, look at the end of [15]. 
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1.3 The transfer-impedance matrix 

The next two paragraphs discuss a theorem that computes probabilities such as P(e, / G 
T) . These computations alone would render the theorem useful, but it appears even more 
powerful in the context of how strongly it constrains the probability measure governing 
T. Let me elaborate. 

Fix a subset S = {e 1: . . . , e k } of the edges of a finite connected graph G. If T is a 
uniform random spanning tree of G then the knowledge of whether e ; G T for each % 
partitions the space into 2 k possible outcomes. (Some of these may have probability zero 
if S contain cycles, but if not, all 2 k may be possible.) In any case, choosing T from the 
uniform distribution on spanning trees of G induces a probability distribution on Q, the 
space of these 2 k outcomes. There are many possible probability distributions on Q: the 
ways of choosing 2 k nonnegative numbers summing to one are a 2 k — 1-dimensional space. 
Theorem 1.1 shows that the actual measure induced by T satisfies certain inequalities, so 
not all probability distributions on Q can be gotten in this way. But the set of probability 
distributions on Q satisfying these inequalities is still 2 k — 1-dimensional. It turns out, 
however, that the set of probability distributions on Q that arise as induced distributions 
of uniform spanning trees on subsets of k edges actually has at most the much smaller 
dimension k(k+l)/2. This is a consequence of the following theorem which is the bulwark 
of our entire discussion of spanning trees: 

Theorem 1.2 (Transfer-Impedance Theorem) Let G be any finite connected graph. 
There is a symmetric function H(e, f) on pairs of edges in G such that for any e±, . . . , e r G 
G, 

P(ei, . . . , e r G T) = det M(e ± , ...,e r ) 
where M(e 1 , . . . , e r ) is the r by r matrix whose i,j-entry is H{e^ e,f). 

By inclusion-exclusion, the probability of any event in fl may be determined from the 
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probabilities of P(e J1 , . . . , ej r G T) as e^, . . . , ej r vary over all subsets of e\, . . . , e^. The 
theorem says that these are all determined by the k{k + l)/2 numbers {//(e^, e 3 -) : i,j < 
A;}, which shows that there are indeed only A; (A; + l)/2 degrees of freedom in determining 
the measure on Q. 

Another way of saying this is that the measure is almost completely determined by its 
two-dimensional marginals, i.e. from the values of P(e, / G T) as e and / vary over pairs 
of (not necessarily distinct) edges. To see this, calculate the values of H(e, /). The values 
of H(e, e) in the theorem must be equal to P(e G T) since P(e, e) = det M(e) = H(e, e). 
To see what H(e, f) is for e^/, write 

P(e,/GT) = detM(e,/) 

= H(e,e)H(e,f)-H(eJ) 2 

= P(eGT)P(/GT)-%/) 2 

and hence 

H(e, f) = ± v /P(eGT)P(/GT)-P(e,/GT). 

Thus the two dimension marginals determine H up to sign, and H determines the mea- 
sure. Note that the above square root is always real, since by Theorem 1.1 the quantity 
under the radical is nonnegative. Section 4 will be devoted to proving Theoreml.2, the 
proof depending heavily on the connections to random walks and electrical networks 
developed in Sections 2 and 3. 

1.4 Applications of transfer- impedance to limit theorems 

Let K n denote the complete graph on n vertices, i.e. there are no self-edges and precisely 
one edge connecting each pair of distinct vertices. Imagine picking a uniform random 
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spanning tree of K n and letting n grow to infinity. What kind of limit theorem might we 
expect? Since a spanning tree of K n has only n — 1 edges, each of the n(n — l)/2 edges 
should have probability 2/n of being in the tree (by symmetry) and is hence decreasingly 
likely to be included as n — > oo. On the other hand, the number of edges incident to 
each vertex is increasing. Say we fix a particular vertex v n in each K n and look at the 
number of edges incident to v n that are included in the tree. Each of n — 1 incident 
edges has probability 2/n of being included, so the expected number of of such edges is 
2(n — l)/n, which is evidently converging to 2. If the inclusion of each of these n — 1 
edges in the tree were independent of each other, then the number of edges incident to 
v n in T would be a binomial random variable with parameters (n — 1,2/n); the well 
known Poisson limit theorem would then say that the random variable D T {v n ) counting 
how many edges incident to v n are in T converged as n — > oo to a Poisson distribution 
with mean two. (A quick explanation: integer-valued random variables X n are said to 
converge to X in distribution if P(X n = k) — > P(X = k) for all integers k. In this 
instance, convergence of D T (v n ) to a Poisson of mean two would mean that for each k, 
P(D T (v n ) — k) — > e~ 2 k 2 /2 as n — > oo for each integer fc.) Unfortunately this can't be 
true because a Poisson(2) is sometimes zero, whereas D T (v n ) can never be zero. It has 
however been shown [2] that Dn{v n ) converges in distribution to the next simplest thing: 
one plus a Poisson of mean one. 

To show you why this really is the next best thing, let me point out a property of the 
mean one Poisson distribution. Pretend that if you picked a family in the United States 
at random, then the number of children in the family would have a Poisson distribution 
with mean one (population control having apparently succeeded). Now imagine picking 
a child at random instead of picking a family at random, and asking how many children 
in the family. You would certainly get a different distribution, since you couldn't ever 
get the answer zero. In fact you would get one plus a Poisson of mean one. (Poisson 
distributions are the only ones with this property.) Thus a Poisson-plus-one distribution 
is a more natural distribution than it looks at first. At any rate, the convergence theorem 
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is 



Theorem 1.3 Let D T (v n ) be the random degree of the vertex v n in a uniform spanning 
tree of K n . Then as n — > oo ; D T (v n ) converges in distribution to X where X is one plus 
a Poisson of mean one. 

Consider now the n-cube B n . Its vertices are defined to be all strings of zeros and 
ones of length n, where two vertices are connected by an edge if and only if they differ in 
precisely one location. Fix a vertex v n G B n and play the same game: choose a uniform 
random spanning tree and let D^{v n ) be the random degree of v n in the tree. It is not 
hard to see again that the expected value, ED, converges to 2 as n — > oo. Indeed, for any 
graph the number of vertices in a spanning tree is one less than the number of vertices, 
and since each edge has two endpoints the average degree of the vertices will be ~ 2; 
if the graph is symmetric, each vertex will then have the same expected degree which 
must be 2. One could expect Theorem 1.3 to hold for B n as well as K n and in fact it 
does. A proof of this for a class of sequences of graphs that includes both K n and B n 
and does not use transfer-impedances appears in [2] along with the conjecture that the 
result should hold for more general sequences of graphs. This can indeed be established, 
and in Section 5 we will discuss the proof of Theorem 1.3 via transfer-impedances which 
can be extended to more general sequences of graphs. 

The convergence in distribution of D^{v n ) in these theorems is actually a special case 
of a stronger kind of convergence. To begin discussing this stronger kind of convergence, 
imagine that we pick a uniform random spanning tree of a graph, say K n , and want to 
write down what it looks like "near v n " . Interpret "near v n " to mean within a distance 
of r of v n , where r is some arbitrary positive integer. The answer will be a rooted tree 
of height r. (A rooted tree is a tree plus a choice of one of its vertices, called the root. 
The height of a rooted tree is the maximum distance of any vertex from the root.) The 
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rooted tree representing T near v n will be the tree you get by picking up T, dangling it 
from v n , and ignoring everything more than r levels below the top. 

Call this the r-truncation of T, written T A Vn r or just T A r when the choice of v n 
is obvious. For example, suppose r — 2, v n has 2 neighbors in T, w\ and w 2 , W\ has 
3 neighbors other than v n in T and u> 2 has none. This information is encoded in the 
following picture. The picture could also have been drawn with left and right reversed, 
since we consider this to be the same abstract tree, no matter how it is drawn. 




figure 2 

When r = 1, the only information in T A r is the number of children of the root, i.e. 
Dx(f n ). Thus the previous theorem asserts the convergence in distribution of T A Vn 1 to 
a root with a (1+Poisson) number of vertices. Generalizing this is the following theorem, 
proved in Section 5. 

Theorem 1.4 For anyr > 1, as n — > oo ; TA Vn r converges in distribution to a particular 
random tree, V\ A r to be defined later. 

Convergence in distribution means that for any fixed tree t of height at most r, P(T A Vn 
r — t) converges as n — > oo to the probability of the random tree V\ A r equalling t. As 
the notation indicates, the random tree V\ A r is the r-truncation of an infinite random 
tree. It is in fact the tree of a Poisson(l) branching process conditioned to live forever, 
but these terms will be defined later, in Section 5. The theorem is stated here only for 
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the sequence K n , but is in fact true for a more general class of sequences, which includes 



2 Spanning trees and random walks 

Unless G is a very small graph, it is virtually impossible to list all of its spanning trees. 
For example, if G = K n is the complete graph on n vertices, then the number of spanning 
trees is n n ~ 2 according to the well known Priifer bijection [17]. If n is much bigger than 
say 20, this is too many to be enumerated even by the snazziest computer that ever 
will be. Luckily, there are shortcuts which enable us to compute probabilities such as 
P(e G T) without actually enumerating all spanning trees and counting the proportion 
containing e. The shortcuts are based on a close correspondence between spanning trees 
and random walks, which is the subject of this section. 

2.1 Simple random walk 

Begin by defining a simple random walk on G. To avoid obscuring the issue, we will 
place extra assumptions on the graph G and later indicate how to remove these. In 
particular, in addition to assuming that G is finite and connected, we will often suppose 
that it is D-regular for some positive integer D, which means that every vertex has 
precisely D edges incident to it. Also suppose that G is simple, i.e. it has no self-edges 
or parallel edges (different edges connecting the same pair of vertices). For any vertex 
x G G, define a simple random walk on G starting at x, written SRW^, intuitively 
as follows. Imagine a particle beginning at time at the vertex x. At each future 
time 1, 2, 3, . . ., it moves along some edge, always choosing among the D edges incident 
to the vertex it is currently at with equal probability. When G is not D-regular, the 
definition will be the same: each of the edges leading away from the current position 
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will be chosen with probability l/degree(i>). This defines a sequence of random positions 
SRWj?(0),SRW°(l),SRW°(2),... which is thus a random function SRW° (or just 
SRW if x and G may be understood without ambiguity) from the nonnegative integers to 
the vertices of G. Formally, this random function may be defined by its finite-dimensional 
marginals which are given by P(SRW^(0) = y , SRW^{1) = y u . . . , SRW^(k) = y k ) = 
D~ k if j/o — ^ an d for alii = 1, . . . , k there is an edge from yi_i to y i: and zero otherwise. 
For an illustration of this definition, let G be the following 3-regular simple graph. 




figure 3 



Consider a simple random walk SRWj[ starting at the vertex A. The probability of 
a particular beginning, say SRW(1) = B and SRW {2) = F is just (1/3) 2 . The random 
position at time 2, SRW (2), is then equal to F with probability 2/9, since each of the 
two ways, ABF and AEF, of getting to F in two steps has probability 1/9. 

Another variant of random walk we will need is the stationary Markov chain cor- 
responding to a simple random walk on G. I will preface this definition with a quick 
explanation of Markov chains; since I cannot do justice to this large topic in two para- 
graphs, the reader is referred to [11], [9] or any other favorite introductory probability 
text for further details. 

A (time-homogeneous) Markov chain on a finite state space S is a sequence of ran- 
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dom variables {X«} taking values in S, indexed by either the integers or the nonnega- 
tive integers and having the Markov property: there is a set of transition probabilities 
{p(x, y) : x,y G S} so that the probability of X i+ i being y, conditional upon Xj = x, is 
always equal to p(x, y) regardless of how much more information about the past you have. 
(Formally, this means P(Xj + i — y\ X^ — x and any values of Xj for j < i) is still p(x, y).) 
An example of this is SRW^, where S is the set of vertices of G and p(x,y) = D~ x if 
x ~ y and otherwise (recall that x ~ y means x is a neighbor of y). The values p(x, y) 
must satisfy J2 y p( x , y) — 1 f° r every x in order to be legitimate conditional probabilities. 
If in addition they satisfy J2x p( x , v) — 1 f° r every y, the Markov chain is said to be doubly 
stochastic. It will be useful later to know that the Markov property is time-reversible, 
meaning if {Xj} is a Markov chaing then so is the sequence {Xj = X_j}, and there are 
backwards transition probabilities p(x, y) for which P(X i _ 1 = y | Xj = x) = p(x, y). 

If it is possible eventually to get from every state in S to every other, then there is 
a unique stationary distribution which is a set of probabilities {n{x) : x G S} summing 
to one and having the property that J2x n(x)p(x, y) = ir(y) for all y. Intuitively, this 
means that if we build a Markov chain with transition probabilities p(x, y) and start it by 
randomizing X so that P(X — x) — ir(x) then it will also be true that P(Xj — x) — ir(x) 
for every i > 0. A stationary Markov chain is one indexed by the integers (as opposed 
to just the positive integers), in which P(Xj — x) — n(x) for some, hence every i. If a 
Markov chain is doubly stochastic, it is easy to check that the uniform distribution U is 
stationary: 

J2U(x)p(x,y) = ]T ISt^foy) = | ^ | 1 = U(y). 

X X 

The stationary distribution it is unique (assuming every state can get to every other) and 
is hence uniform over all states. 

Now we define a stationary simple random walk on G to be a stationary Markov 
chain with state space V(G) and transition probabilities p(x,y) = D~ l if x ~ y and 
otherwise. Intuitively this can be built by choosing X at random uniformly over V(G), 
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then choosing the Xi for i > by walking randomly from Xq along the edges and choosing 
the Xi for % < also by walking randomly from Xq, thinking of this latter walk as going 
backwards in time. (For SRW, p(x, y) = p(y, x) = p(x, y) so the walk looks the same 
backwards as forwards.) 

2.2 The random walk construction of uniform spanning trees 

Now we are ready for the random walk construction of uniform random spanning trees. 
What we will actually get is a directed spanning tree, which is a spanning tree together 
with a choice of vertex called the root and an orientation on each edge (an arrow pointing 
along the edge in one of the two possible directions) such that following the arrows always 
leads to the root. Of course a directed spanning tree yields an ordinary spanning tree if 
you ignore the arrows and the root. Here is an algorithm to generate directed trees from 
random walks. 

GROUNDSKEEPER'S ALGORITHM 

Let G be a finite, connected, -D-regular, simple graph and let x be any 
vertex of G. Imagine that we send the groundskeeper from the local baseball 
diamond on a walk along the edges of G starting from x; later we will take to 
be the walk SRW^f. She brings with her the wheelbarrow full of chalk used 
to draw in lines. This groundskeeper is so eager to choose a spanning tree for 
G that she wants to chalk a line over each edge she walks along. Of course if 
that edge, along with the edges she's already chalked, would form a cycle (or 
is already chalked), she is not allowed to chalk it. In this case she continues 
walking that edge but temporarily - and reluctantly - shuts off the flow of 
chalk. Every time she chalks a new edge she inscribes an arrow pointing from 
the new vertex back to the old. 
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Eventually every vertex is connected to every other by a chalked path, so no more 
can be added without forming a cycle and the chalking is complete. It is easy to see 
that the subgraph consisting of chalked edges is always a single connected component. 
The first time the walk reaches a vertex y, the edge just travelled cannot form a cycle 
with the other chalked edges. Conversely, if the walk moves from z to some y that has 
been reached before, then y is connected to z already by some chalked path, so adding 
the edge zy would create a cycle and is not permitted. Also it is clear that following the 
arrows leads always to vertices that were visited previously, and hence eventually back 
to the root. Furthermore, every vertex except x has exactly one oriented edge leading 
out of it, namely the edge along which the vertex was first reached. 

Putting this all together, we have defined a function - say r - from walks on G 
(infinite sequences of vertices each consecutive pair connected by an edge) to directed 
spanning trees of G. Formally r(yo, yi, yi, • • •) is the subgraph H C G such that if e is 
an oriented edge from w to z then 

eGiJ^> for some k > 0,yk = z, yu-i = w, and there is no j < k such that yj = z. 

As an example, suppose SRW^ in figure 2.1 begins ABFBCDAE. Then applying r gives 
the tree with edges BA, FB, CB, DC and EA. 

To be completely formal, I should admit that the groundskeeper's algorithm never 
stops if there is a vertex that the walk fails to hit in finite time. This is not a problem 
since we are going to apply r to the path of a SRW, and this hits every vertex with 
probability one. As hinted earlier, the importance of this construction is the following 
equivalence. 

Theorem 2.1 Let G be any finite, connected, D-regular, simple graph and let x be any 
vertex of G. Run a simple random walk SRW^ and let T be the random spanning tree 
gotten by ignoring the arrows and root of the random directed spanning tree t(SRW^) . 
Then T has the distribution of a uniform random spanning tree. 
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To prove this it is necessary to consider a stationary simple random walk on G 
(SSRW G ). It will be easy to get back to a SRW® because the sites visited in posi- 
tive time by a SSRW conditioned on being at x at time zero form a SRW^ . Let T n be 
the tree r(SSRW(n), SSRW{n + 1), . . .); in other words, T n is the directed tree gotten 
by applying the groundskeeper's algorithm to the portion of the stationary simple ran- 
dom walk from time n onwards. The first goal is to show that the random collection of 
directed trees T n forms a time- homogeneous Markov chain as n ranges over all integers. 

Showing this is pretty straightforward because the transition probabilities are easy 
to see. First note that if t and u are any two directed trees on disjoint sets of vertices, 
rooted respectively at v and w, then adding any arrow from v to a vertex in u combines 
them into a single tree rooted at w. Now define two operations on directed spanning 
trees of G as follows. 

Operation F(t,x): Start with a directed tree t rooted at v. Choose one of the the D 
neighbors of v in G, say x. Take away the edge in t that leads out of x, separating t into 
two trees, rooted at v and x. Now add an edge from v to x, resulting in a single tree 
F(t,x). 

Operation F~ l {t, w): Start with a directed tree t rooted at x. Choose one of the the D 
neighbors of x in G, say w. Follow the path from w to x in t and let v be the last vertex 
on this path before x. Take away the edge in t that leads out of v, separating t into two 
trees, rooted at x and v. Now add an edge from x to w, resulting in a single directed tree 
F-\t,w). 

It is easy to see that these operations really are inverse to each other, i.e. if t is rooted 
at v then x), w) = t for any x ~ v, where w is the other endpoint of the edge 

leading out of x in t. Here is a pictorial example. 
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figure 4 



I claim that for any directed trees t and u, the backward transition probability 
p(t,u) is equal to D~ l if u — F(t,x) for some x and zero otherwise. To see this, it 
is just a matter of realizing where the operation F comes from. Remember that T n 
is just r(SSRW(n), SSRW(n + 1), . . .), so in particular the root of T n is SSRW{n). 
Now SSRW really is a Markov chain. We already know that P(SSRW(n — 1) = 
x | SSRW(n) = v) is D~ l if x ~ v and zero otherwise. Also, this is unaffected by 
knowledge of SSRW(j) for any j > n. Suppose it turns out that SSRW{n — 1) = x. 
Then knowing only T n and x (but not the values of SSRW(j) for j > n) it is possi- 
ble to work out what T„_i is. Remember that T n and T n _i come from applying r to 
the respective sequences SSRW (n) , SSRW (n + 1) , . . . and SSRW (n - 1) , SSRW (n) , . . . 
whose only difference is that the second of these has an extra x tacked on the beginning. 
Every time the first sequence reaches a vertex for the first time, so does the second, 
unless that vertex happens to be x. So the T„_i has has all the oriented edges of T n 
except the one out of x. What it has instead is an oriented edge from v to x, chalked in 
by the groundskeeper at her very first step. Adding in the edge from v to some neigh- 
bor x and erasing the edge out of x yields precisely F(t,x). So we have shown that 
T n _i = F(T n ,SSRW(n - 1)). But SSRW{n - 1) is uniformly distributed among the 
neighbors of SSRW{n) no matter what other information we know about the future. 
This proves the claim and the time-homogeneous Markov property. 
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The next thing to show is that the stationary distribution is uniform over all directed 
trees. As we've seen, this would follow if we knew that {T n } was doubly stochastic. Since 
p(t,u) is D^ 1 whenever u = F(t,x) for some x and zero otherwise, this would be true 
if for every tree u there are precisely D trees t for which F(t,x) = u for some x. But 
the trees t for which F(t,x) = u for some x are precisely the trees F _1 (u, x) for some 
neighbor x of the root of u, hence there are D such trees and transition probabilities for 
SSRW are doubly stochastic. 

Now that the stationary distribution for {T n } has been shown to be uniform, the 
proof of Theorem 2.1 is almost done. Note that the event SSRW(0) = x is the same 
as the event of t(SSRW(0), SSRW(1), . . .) being rooted at x. Since SRW% is just 
SSRW conditioned on SSRW(0) = x, To(SRW^) is distributed as a uniform directed 
spanning tree conditioned on being rooted at x. That is to say, Tq(SRW^) is uniformly 
distributed over all directed spanning trees rooted at x. But ordinary spanning trees 
are in a one to one correspondence with directed spanning trees rooted at a fixed vertex 
x, the correspondence being that to get from the ordinary tree to the directed tree you 
name x as the root and add arrows that point toward x. Then the tree T gotten from 
T (SRW^) by ignoring the root and the arrows is uniformly distributed over all ordinary 
spanning trees of G, which is what we wanted to prove. □ 

2.3 Weighted graphs 

It is time to remove the extra assumptions that G is D-regular and simple. It will make 
sense later to generalize from graphs to weighted graphs, and since the generalization of 
Theorem 2.1 is as easy for weighted graphs as for unweighted graphs, we may as well 
introduce weights now. 

A weighted graph is just a graph to each edge e of which is assigned a positive real 
number called its weight and written w(e). Edge weights are not allowed to be zero, 
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though one may conceptually identify a graph with an edge of weight zero with the 
same graph minus the edge in question. An unweighted graph may be thought of as a 
graph with all edge weights equal to one, as will be clear from the way random trees and 
random walks generalize. Write d(v) for the sum of the weights of all edges incident to v. 
Corresponding to the old notion of a uniform random spanning tree is the weight-selected 
random spanning tree (WST). A WST, T is defined to have 

P( T = t)= Ue&w ^ 
E„Il ee „w(e) 

so that the probability of any individual tree is proportional to its weight which is by 
definition the product of the weights of its edges. 

Corresponding to a simple random walk from a vertex x is the weighted random walk 
from x, WRW G which is a Markov Chain in which the transition probabilities from a 
vertex v are proportional to the weights of the edges incident to v (among which the walk 
must choose). Thus if v has two neighbors w and x, and there are four edges incident 
to v with respective weights 1,2,3 and 4 that connect v respectively to itself, w, x and 
x, then the probabilities of choosing these four edges are respectively 1/10,2/10,3/10 
and 4/10. Formally, the probability of walking along an edge e incident to the current 
position v is given by w(e)/d(v). The bookkeeping is a little unwieldly since knowing 
the set of vertices WRW(0), WRW(1), . . . visited by the WRW does not necessarily 
determine which edges were travelled now that the graph is not required to be simple. 
Rather than invent some clumsy ad hoc notation to include the edges, it is easier just to 
think that a WRW includes this information, so it is not simply given by its positions 
WRW (J) : j > 0, but that we will refer to this information in words when necessary. 
If G is a connected weighted graph then WRW G has a unique stationary distribution 
denoted by positive numbers ir G (v ) summing to one. This will not in general be uniform, 
but its existence is enough to guarantee the existence of a stationary Markov chain with 
the same transition probabilities. We call this stationary Markov chain SWRW the few 
times the need arises. The new and improved theorem then reads: 
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Theorem 2.2 Let G be any finite, connected weighted graph and let x be any vertex of 
G. Run a weighted random walk WRW^ and let T be the random spanning tree gotten 
by ignoring the arrows and root of the random directed spanning tree r{W RW^) . Then 
T has the distribution of WST . 

The proof of Theorem 2.1 serves for Theorem 2.2 with a few alterations. These will 
now be described, thought not much would be lost by taking these details on faith and 
skipping to the next section. 

The groundskeeper's algorithm is unchanged with the provision that the WRW brings 
with it the information of which edge she should travel if more than one edge connects 
WRW{i) to WRW(i + 1) for some i. The operation to get from the directed tree T n 
to a candidate for T n _ x is basically the same only instead of there being D choices for 
how to do this there is one choice for each edge incident to the root v of T n : choose 
such an edge, add it to the tree oriented from v to its other endpoint x and remove the 
edge out of x. It is easy to see again that {T n } is a time-homogeneous Markov chain 
with transition probability from t to u zero unless u can be gotten from t by the above 
operation, and if so the probability is proportional to the weight of the edge that was 
added in the operation. (This is because if T n — t then T n _i = u if and only if u can be 
gotten from this operation and WRW travelled along the edge added in this operation 
between times n — 1 and n.) 

The uniform distribution on vertices is no longer stationary for WRW since we no 
longer have D-regularity, but the distribution ir{v) = d(v)/J2 x d{x) is easily seen to be 
stationary: start a WRW with WRW(0) having distribution 7r; then 



P(WRW(1) =v) 



Y,P{WRW{0) = x and WRW(1) = v) 



X 
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J2 y d(y) 

e incident to v 

= 7l(v). 

The stationary distribution ir for the Markov chain {T n } gives a directed tree t rooted 
at v probability 

7r(t) = Kd(v)]Jw(e), 

eGt 

where K = (J2t d(root(t)) fleet w(e))" 1 is a normalizing constant. If t, rooted at v, can 
go to u, rooted at x, by adding an edge e and removing the edge /, then 7r(w)/7r(t) = 
d(x)w(e)/d(v)w(f). To verify that it is a stationary distribution for T n write C(-u) for 
the class of trees from which it is possible to get to u in one step and for each t G C(u) 
write vt, e t and f t for the root of t, edge added to t to get u and edge taken away from t 
to get u respectively. If u is rooted at x, then 

P(T n _! = u) = J2 P ( T n = t and T n-i = M ) 
t 

= X) ntywfa) I d ( v t) 
tec(u) 

= [n(u)d(v t )w(f t )/d(x)w(e t )]w(e t )/d(v t ) 
tec(u) 

= tt(u) X w (ft)/d(x) 
teC(u) 

= tt(u), 

since as t ranges over all trees that can get to w, / t ranges over all edges incident to x. 

Finally, we have again that t(WRW^(0)) is distributed as t(SWRW g (0)) condi- 
tioned on having root x, and since the unconditioned ir is proportional to d(x) times 
the weight of the tree (product of the edge weights), the factor of d is constant and 
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P(t(W^RW^ g (0)) = t) is proportional to Ueet w(e) for any t rooted at x. Thus t(WRW^(0)) 
is distributed identically to WST. □ 

2.4 Applying the random walk construction to our model 

Although the benefit is not yet clear, we have succeeded in translating the question of 
determining P(e G T) from a question about uniform spanning trees to a question about 
simple random walks. To see how this works, suppose that e connects the vertices x and 
y and generate a uniform spanning tree by the random walk construction starting at x: 
T = the tree gotten from t(SRW^) by ignoring the root and arrows. If e G T then its 
orientation in r(SRW^) must be from y to x, and so e G T if and only if SRW{k — 1) = x 
where k is the least k for which SRW(k) = y. In other words, 

P(e G T) = P (first visit of SRW? to y is along e). (1) 

The computation of this random walk probability turns out to be tractable. 

More important is the fact that this may be iterated to get probabilities such as 
P(e, / G T). This requires two more definitions. If G is a finite connected graph and 
e is an edge of G whose removal does not disconnect G, then the deletion of G by e is 
the graph G\e with the same vertex set and the same edges minus e. If e is any edge 
that connects distinct vertices x and y, then the contraction of G by e is the graph G/e 
whose vertices are the vertices of G with x and y replaced by a single vertex x * y. There 
is an edge p(f) of G/e for every edge of / of G, where if one or both endpoints of / 
is x or y then that endpoint is replaced by x * y in p(f). We write p(z) for the vertex 
corresponding to z in this correspondence, so p(x) = p(y) = x * y and p(z) = z for every 
z 7^ x,y. The following example shows G\ and G\je±. The edge e 4 itself maps to a 
self-edge under p, e 5 becomes parallel to e 6 and D and E map to D * E. 
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It is easy to see that successive deletions and contractions may be performed in any 
order with the same result. If ei,...,e r are edges of G whose joint removal does not 
disconnect G then the successive deletion of these edges is permissible. Similarly if 
{/i> • • • ) fs} is a set of edges of G that contains no cycle, these edges may be successively 
contracted and the graph G \ ei, . . . , e r / fi, ■ ■ ■ , f s is well-defined. It is obvious that the 
spanning trees of G \ e are just those spanning trees of G that do not contain e. Almost 
as obvious is a one to one correspondence between spanning trees of G containing e and 
spanning trees of G/e: if t is a spanning tree of G containing e then there is a spanning 
tree of G/e consisting of {p(f) : / 7^ e E t}. 

To translate P(e, / G T) to the random walk setting, write this as P(e G T)P(/ G 
T I e G T). The first term has already been translated. The conditional distribution of a 
uniform random spanning tree given that it contains e is just uniform among those trees 
containing e, which is just Pc/ e {p{f) G T) where the subscript G/e refers to the fact 
that T is now taken to be a uniform random spanning tree of G/e. If f connects z and x 
then this is in turn equal to P{SRW < ^Jj. first hits p(z) along p(f)). Both the terms have 
thus been translated; in general it should be clear how this may be iterated to translate 
the probability of any elementary event, P(ei, . . . , e r G T and fi, ■ ■ ■ , f s 4- T) into a 
product of random walk probabilities. It remains to be seen how these probabilities may 
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be calculated. 



3 Random walks and electrical networks 

Sections 3.1 - 3.3 contain a development of the connection between random walks and 
electrical networks. The right place to read about this is in [8]; what you will see here is 
necessarily a bit rushed. Sections 3.5 and 3.6 contain similarly condensed material from 
other sources. 

3.1 Resistor circuits 

The electrical networks we discuss will have only two kinds of elements: resistors and 
voltage sources. Picture the resistors as straight pieces of wire. A resistor network will 
be built by soldering resistors together at their endpoints. That means that a diagram 
of a resistor network will just look like a finite graph with each edge bearing a number: 
the resistance. Associated with every resistor network H is a weighted graph Gh which 
looks exactly like the graph just mentioned except that the weight of an edge is not the 
resistance but the conductance, which is the reciprocal of the resistance. The distinction 
between H and Gh is only necessary while we are discussing precise definitions and will 
then be dropped. A voltage source may be a single battery that provides a specified 
voltage difference (explained below) across a specified pair of vertices or is may be a 
more complicated device to hold various voltages fixed at various vertices of the network. 
Here is an example of a resistor network on a familiar graph, with a one volt battery 
drawn as a dashed box. Resistances on the edges (made up arbitrarily) are given in ohms. 
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The electrical properties of such a network are given by Kirchoff 's laws. For the sake 
of exposition I will give the laws numbers, although these do not correspond to the way 
Kirchoff actually stated the laws. The first law is that every vertex of the network has 
a voltage which is a real number. The second law gives every oriented edge (resistor) a 
current. Each edge has two possible orientations. Say an edge connects x and y. Then 
the current through the edge is a real number whose sign depends on which orientation 
you choose for the edge. In other words, the current I(xy) that flows from x to y is some 
real number and the current I(yx) is its negative. (Note though that the weights w(e) are 
always taken to positive; weights are functions of unoriented edges, whereas currents are 
functions of oriented edges.) If /(e) denotes the current along an oriented edge e = x~y, 
V(x) denotes the voltage at x and R(e) denotes the resistance of e, then quantatively, 
the second law says 

I(xy) = [V(x) - V(y)}R(e)-\ (2) 

Kirchoff 's third law is that the total current flowing into a vertex equals the total current 
flowing out, or in other words 

E = o- (3) 

This may be rewritten using (2). Recalling that in the weighted graph G H) the weight 
w(e) is just -R(e) -1 and that d{v) denotes the sum of w(e) over edges incident to v, we 
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get at every vertex x an equation 



= YJyV) - V{y)\w{xy) = V(x)d(x) - ]T V{y)w{xy). (4) 

y~x y~x 

Since a voltage source may provide current, this may fail to hold at any vertex con- 
nected to a voltage source. The above laws are sufficient to specify the voltages of the 
network - and hence the currents - except that a constant may be added to all the volt- 
ages (in other words, it is the voltage differences that are determined, not the absolute 
voltages). In the above example the voltage difference across AB is required to be one. 
Setting the voltage at B to zero (since the voltages are determined only up to an addi- 
tive constant) the reader may check that the voltages at A, C, D and E are respectively 
1, 4/7, 5/7 and 6/7 and the currents through AB, AE, ED, AD, DC, CB are respectively 
1,1/7,1/7,1/7,2/7,2/7. 



3.2 Harmonic functions 

The voltages in a weighted graph G (which we are now identifying with the resistor 

network it represents) under application of a voltage source are calculated by finding a 

solution to Kirchoff 's laws on G with specified boundary conditions. For each vertex x 

there is an unknown voltage V(x). There is also a linear equation for every vertex not 

connected to a voltage source, and an equation given by the nature of each voltage source. 

Will these always be enough information so that Kirchoff's laws have a unique solution? 

The answer is yes and it is most easily seen in the context of harmonic functions. 1 

1 There is also the question of whether any solution exists, but addressing that would take us too far 
afield. If you aren't convinced of its existence on physical grounds, wait until the next subsection where 
a probabilistic interpretation for the voltage is given, and then deduce existence of a solution from the 
fact that these probabilities obey Kirchoff's laws. 
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If / is a function on the vertices of a weighted graph G, define the excess of / at a 
vertex v, written A f(v) by 



Af(v) = Y,[f(v)-f(y)Mvy). 

You can think of A as an operator that maps functions / to other functions Af that 
is a discrete analog of the Laplacian operator. A function / from the vertices of a 
finite weighted graph G to the reals is said to be harmonic at a vertex v if and only if 
Af(v) = 0. Note that for any function /, the sum of the excesses J2 v gg Af(v) = 0, since 
each [f(x) — f(y)]w(xy) cancels a [f(y) — f(x)]w(yx) due to w(xy) = w(yx). To see what 
harmonic functions are intuitively, consider the special case where G is unweighted, i.e. 
all of the edge weights are one. Then a function is harmonic if and only if its value at 
a vertex x is the average of the values at the neighbors of x. In the weighted case the 
same is true, but with a weighted average! Here is an easy but important lemma about 
harmonic functions. 

Lemma 3.1 (Maximum principle) Let V be a function on the vertices of a finite 
connected weighted graph, harmonic everywhere except possibly at vertices of some set 
X = {xi, . . . ,Xk}. Then V attains its maximum and minimum on X. IfV is harmonic 
everywhere then it is constant. 

Proof: Let S be the set of vertices where V attains its maximum. Certainly S is 
nonempty. If x £ S has a neighbor y ^ S then V cannot be harmonic at x since 
V(x) would then be a weighted average of values less than or equal to V(x) with at 
least one strictly less. In the case where V is harmonic everywhere, this shows that no 
vertex in S has a neighbor not in S, hence since the graph is connected every vertex is 
in S and V is constant. Otherwise, suppose V attains its maximum at some y X and 
pick a path connecting y to some x G X. The entire path must then be in S up until 
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and including the first vertex along the path at which V is not harmonic. This is some 
x' G X. The argument for the minimum is just the same. □ 



Kirchoff 's third law (4) says that the voltage function is harmonic at every x not con- 
nected to a voltage source. Suppose we have a voltage source that provides a fixed voltage 
at some specified set of vertices. Say for concreteness that the vertices are x 1: . . . , x k and 
the voltages produced at these vertices are c 1: . . . , c k . We now show that Kirchoff 's laws 
determine the voltages everywhere else, i.e. there is at most one solution to them. 

Theorem 3.2 Let V and W be real-valued functions on the vertices of a finite weighted 
graph G. Suppose that V{xj) = W{xi) = Ci for some set of vertices Xi,...,x k and 
1 < i < k and that V and W are harmonic at every vertex other than x±, . . . ,x k - Then 
V = W. 

Proof: Consider the function V — W. It is easy to check that being harmonic at x is 
a linear property, so V — W is harmonic at every vertex at which both V and W are 
harmonic. Then by the Maximum Principle, V — W attains its maximum and minimum 
at some Xj. But V — W = at every x i: so V — W = 0. □ 

Suppose that instead of fixing the voltages at a number of points, the voltage source 
acts as a current source and supplies a fixed amount of current Li to vertices Xi, 1 < i < k. 
This is physically reasonable only if Ya=i h — 0. Then a net current of Jj will have to 
flow out of each Xi into the network. Using (2) gives 

h = ]T w(x,y)(V(x) - V{y)) = AV(x). 

From this it is apparent that the assumption J2i h — is algebraically as well as physically 
necessary since the excesses must sum to zero. Kirchoff 's laws also determine the voltages 
(up to an additive constant) of a network with current sources, as we now show. 
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Theorem 3.3 Let V and W be real-valued functions on the vertices of a finite weighted 
graph G. Suppose that V and W both have excess q at Xi for some set of vertices Xi and 
reals Ci , 1 < i < k. Suppose also that V and W are harmonic elsewhere. Then V = W 
up to an additive constant. 

Proof: Excess is linear, so the excess of V — W is the excess of V minus the excess of W. 
This is zero everywhere, so V — W is harmonic everywhere. By the Maximum Principle, 
V — W is constant. □ 



3.3 Harmonic random walk probabilities 

Getting back to the problem of random walks, suppose G is a finite connected graph and 
x, a, b are vertices of G. Let's say that I want to calculate the probability that SRW X 
reaches a before b. Call this probability h ab {x). It is not immediately obvious what this 
probability is, but we can get an equation by watching where the random walk takes its 
first step. Say the neighbors of x are y± : . . . ,yd- Then P(SRW X (1) = yi) = d^ 1 for each 
% < d. If we condition on P(SRW X (1) = yi) then the probability of the walk reaching a 
before b is (by the Markov property) the same as if it had started out at This is just 
h ab (yi). Thus 

i 

= d~ l J2Kb(yi)- 

i 

In other words, h ab is harmonic at x. Be careful though, if x is equal to a or b, it doesn't 
make sense to look one step ahead since SRW X (Q) already determines whether the walk 
hit a or b first. In particular, h ab (a) = 1 and h ab (b) = 0, with h ab being harmonic at 
every x ^ a,b. 
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Theorem 3.2 tells us that there is only one such function h a b- This same function 
solves Kirchoff 's laws for the unweighted graph G with voltages at a and b fixed at 1 and 
respectively. In other words, the probability of SRW X reaching a before b is just the 
voltage at x when a one volt battery is connected to a and b and the voltage at b is taken 
to be zero. If G is a weighted graph, we can use a similar argument: it is easy to check 
that the first-step transition probabilities p(x,y) = w(xy)/J2 z w(xz) show that h a b(x) is 
harmonic in the sense of weighted graphs. Summarizing this: 

Theorem 3.4 Let G be a finite connected weighted graph. Let a and b be vertices of G. 
For any vertex x, the probability of SRW^ reaching a before b is equal to the voltage at 
x in G when the voltages at a and b are fixed at one and zero volts respectively. 

Although more generality will not be needed we remark that this same theorem holds 
when a and b are taken to be sets of vertices. The probability of SRW X reaching a vertex 
in a before reaching a vertex in b is harmonic at vertices not in a U b, is zero on b and 
one on a. The voltage when vertices in b are held at zero volts and vertices in a are held 
at one volt also satisfies this, so the voltages and the probabilities must coincide. 

Having given an interpretation of voltage in probabilistic terms, the next thing to find 
is a probabilistic interpretation of the current. The arguments are similar so they will 
be treated briefly; a more detailed treatment appears in [8]. First we will need to find 
an electrical analogue for the numbers u a b(x) which are defined probabilistically as the 
expected number of times a SRW a hits x before the first time it hits b. This is defined to 
be zero for x = b. For any x ^ a, b, let y±, . . . , y r be the neighbors of x. Then the number 
of visits to x before hitting b is the sum over % of the number of times SRW a hits y^ 
before b and goes to x on the next move (the walk had to be somewhere the move before 
it hit x). By the Markov property, this quantity is u a b(yi)p(yi,x) = Uabiy^wi^xy^ / d{y.i) . 
Letting (p a b(z) denote u a b(z)/d(z) for any z, this yields 

(f)ab(x) = d(x)u ab (x) = ^u ah {y i )w{x'y i )/d{y i ) = ^w{xy^ ab {y i ). 

i i 
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In other words <f) a b is harmonic at every x 7^ a,b. Writing K ab for (p a b{a) we then have 
that (f) a b is Kgh at a, zero at b and harmonic elsewhere, hence it is the same function as 
the the voltage induced by a battery of K ab volts connected to a and b, with the voltage 
at b taken to be zero. Without yet knowing what K ab is, this determines (p ab U P to a 
constant multiple. This in turn determines u a b, since u a b(x) = d(x)(p a b(x). 

Now imagine that we watch SRW a to see when it crosses over a particular edge xy 
and count plus one every time it crosses from x to y and minus one every time it crosses 
from y to x. Stop counting as soon as the walk hits b. Let H ab {xy) denote the expected 
number of signed crossings. (H now stands for harmonic, not for the name of a resistor 
network.) We can calculate H in terms of u ab by counting the plusses and the minuses 
separately. The expected number of plus crossings is just the expected number of times 
the walk hits x, mulitplied by the probability on each of these occasions that the walk 
crosses to y on the next move. This is u a b(x)w(x~y) / d(x) . Similarly the expected number 
of minus crossings is u a b(y)w(xy)/d(y). Thus 

H a b(xy) = u ab {x)w{xy) / d{x) - u ab {y)w{xy) / d{y) 
= w(xy)[(j) ab (x) - (pabiy)}- 

But 4>ab(%) — (pabiy) is just the voltage difference across xy induced by a i^^-volt battery 
across a and b. Using (2) and w(xy) = ^(xy) -1 shows that the expected number of 
signed crossings of xy is just the current induced in xy by a K ab -Yo\t battery connected 
to a and b. A moment's thought shows that the expected number of signed crossings of 
all edges leading out of a must be one, since the walk is guaranteed to leave a one more 
time than it returns to a. So the current supplied by the -K^-volt battery must be one 
amp. Another way of saying this is that 

A0 ab = 5 a - 5 b . (5) 

Instead of worrying about what K ab is, we may just as well say that the expected number 
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of crossings of xy by SRW a before hitting b is the current induced when one amp is 
supplied to a and drawn out at b. 

3.4 Electricity applied to random walks applied to spanning 
trees 

Finally we can address the random walk question that relates to spanning trees. In 
particular, the claim that the probability in equation (1) is tractable will be borne out 
several different ways. First we will see how the probability may be "calculated" by 
an analog computing device, namely a resistor network. In the next subsection, the 
computation will be carried out algebraically and very neatly, but only for particularly 
nice symmetric graphs. At the end of the section, a universal method will be given for the 
computation which is a little messier. Finally in Section 4 the question of the individual 
probabilities in (1) will be avoided altogether and we will see instead how values for 
these probabilities (wherever they might come from) determine the probabilities for all 
contractions and deletions of the graph and therefore determine all the joint probabilities 
P(ei, . . . , ek G T) and hence the entire measure. 

Let e = xy be any edge of a finite connected weighted graph G. Run SRW^ until it 
hits y. At this point either the walk just moved along e from x to y - necessarily for the 
first time - and e will be in the tree T given by t(SRW^), or else the walk arrived at 
y via a different edge in which case the walk never crossed e at all and e ^ T. In either 
case the walk never crossed from y to x since it stops if it hits y. Then the expected 
number of signed crossings of e = xy by SRW X up to the first time it hits y is equal to 
the probability of first reaching y along e which equals P(e G T). Putting this together 
with the electrical interpretation of signed crossings give 

Theorem 3.5 P(e G T) = the fraction of the current that goes through edge e when a 
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battery is hooked up to the two endpoints of e. 

□ 

This characterization leads to a proof of Theorem 1.1 provided we are willing to accept 
a proposition that is physically obvious but not so easy to prove, namely 

Theorem 3.6 (Rayleigh's monotonicity law) The effective resistance of a circuit 
cannot increase when a new resistor is added. 

The reason this is physically obvious is that adding a new resistor provides a new path 
for current to take while allowing the current still to flow through all the old paths. 
Theorem 1.1 says that the conditional probability of e G T given / e T must be less 
than or equal to the unconditional probability. Using Theorem 3.5 and the fact that the 
probabilities conditioned on / ^ T are just the probabilities for WST on G \ /, this 
boils down to showing that the fraction of current flowing directly across e is no greater 
on G than it is on G \ f . The battery across e meets two parallel resistances: e and 
the effective resistance of the rest of G. The fraction of current flowing through e is 
inversely proportional to the ratio of these two resistances. Rayleigh's theorem says that 
the effective resistance of the rest of G including / is at most the effective resistance of 
G \ f, so the fraction flowing through e on G is at most the fraction flowing through e 
on G \ f. In Section 4, a proof will be given that does not rely on Rayleigh. 

3.5 Algebraic calculations for the square lattice 

If G is a finite graph, then the functions from the vertices of G to the reals form a finite- 
dimensional real vector space. The operator A that maps a function V to its excess is a 
linear operator on this vector space. In this language, the voltages in a resistor network 
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with one unit of current supplied to a and drawn out at b are the unique (up to additive 
constant) function V that solves AV = 5 a — 5b- Here S x is the function that is one at x 
and zero elsewhere. This means that V can be calculated simply by inverting A in the 
basis {5 X ; x e G}. Although A is technically not invertible, its nullspace has dimension 
one so it can be inverted on a set of codimension one. A classical determination of V for 
arbitrary graphs is carried out in the next subsection. The point of this subsection is to 
show how the inverse can be obtained in a simpler way for nice graphs. 

The most general "nice" graphs to which the method will apply are the infinite TL d - 
periodic lattices. Since in this article I am restricting attention to finite graphs, I will not 
attempt to be general but will instead show a single example. The reader may look in [6] 
for further generality. The example considered here is the square lattice. This is just the 
graph you see on a piece of graph paper, with vertices at each pair of integer coordinates 
and four edges connecting each point to its nearest neighbors. The exposition will be 
easiest if we consider a finite square piece of this and impose wrap-around boundary 
conditions. Formally, let T n (T for torus) be the graph whose vertices are pairs of integers 
{(hj) '■ < i, j < n— 1} and for which two points are connected if and only if they agree 
in one component and differ by one mod n in the other component. Here is a picture of 
this with n = 3 and the broken edges denoting edges that wrap around to the other side 
of the graph. The graph is unweighted (all edge weights are one.) 
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figure 7 

Let C = e 2 " Kt l n denote the first n th root of unity. To invert A we exhibit its eigenvectors. 
Since the vector space is a space of functions, the eigenvectors are called eigenfunctions. 
For each pair of integers < k, I < n — 1 let be the function on the vertices of T n 
defined by 

fki(i,j) = ( ki+lj . 

If you have studied group representations, you will recognize fki as the representations 
of the group T n = ("E/riE) 2 and in fact the rest of this section may be restated more 
compactly in terms of characters of this abelian group. 

It is easy to calculate 

A fki(i,j) = 4( hi+lj - £ ki+l (i+ 1 ) - _ £k(i+i)+ij _ £k{i-i)+ij 

= ^ + i J{4 _ c k_ c k_ c i_ c i ) 

= ^ki+ij^ _ 2 cos ( 27 rA;/n) - 2 cos(27rZ/n)). 

Since the multiplicative factor (4 — 2 cos(2irk/n) — 2cos(2irl/n)) does not depend on i 
or j, this shows that fki is indeed an eigenfunction for A with eigenvalue X^i = 4 — 
2 cos(2nk/n) — 2 cos(2nl / n) . 
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Now if {vk} are eigenvectors for some linear operator A with eigenvalues {A^}, then 
for any constants {ck}, 

A ^(Y1 °k v k) = K lc kVk- (6) 
k k 

If some Afc is equal to zero, then the range of A does not include vectors w with Ck 7^ 0, so 
A^w does not exist for such w and indeed the formula blows up due to the A^ 1 . In our 
case \ki = 4 — 2 cos(27r/c/n) — 2cos(27r//n) = only when k — I — 0. Thus to calculate 
A^ 1 (5 a — 5b) we need to figure out coefficents Cki for which 5 a — 5b = J2ki c kifki and verify 
that Coo — 0. For this puropose, it is fortunate that the eigenfunctions {fki} are actually 
a unitary basis in the inner product < f,g >= f(h j)g(h ])■ You can check this by 
writing 

<fkufm> >=J2C ki+li e 7ITF1 ; 

ij 

elementary algebra show this to be one if k — k' and / = I' and zero otherwise, which what 

it means to be unitary. Unitary bases are great for calculation because the coefficients 

{cm} of any V in a unitary eigenbasis {fki} are given by cm =< V, fki >■ In our case, 

this means cm = V(i, j)fu{h j)- Letting a be the vertex (0, 0), b be the vertex (1, 0) 

k 

and V = 5 a — 5b, this gives = 1 — C an d hence 

5 a -5b = Y.^-t)fki- 

k,i 

We can now plug this into equation (6), since clearly coo = 0. This gives 
AV = 5 a -5 b 



<* v(i,j) = cf 00 (i,j)+ £ (i - C*)Aw7«(m) 

(fc,0^(o,o) 

= c+ V 1 ~ ^ C ki+lj (7) 

(fc.ijfeo.o) 4 - 2 c °s(2vrA;/n) - 2 cos(27r//n) ' y ' 
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This sum is easy to compute exactly and to approximate efficiently when n is large. In 
particular as n — > oo the sum may be replaced by an integral which by a small miracle 
admits an exact computation. Details of this may be found in [16, page 148]. You may 
check your arithmetic against mine by using (7) to derive the voltages for a one volt 
battery placed across the bottom left edge e of T 3 and across the bottom left edge e' of 
T 4 : 

5/8 3/8 1/2 
5/8 3/8 1/2 
1 1/2 

Section 5 shows how to put these numbers to good use, but we can already make 
one calculation based on Theorem 3.5. The four currents flowing out of the bottom left 
vertex under the voltages shown are given by the voltage differences: 1, 3/8, 1/2 and 3/8. 
The fraction of the current flowing directly through the bottom left edge e is 8/18, and 
according to Theorem 3.5, this is P(e G T). An easy way to see this is right is by the 
symmetry of the graph T 3 . Each of the 18 edges should be equally likely to be in T, and 
since every spanning tree has 8 edges, the probability of any given edge being in the tree 
must be 8/18. 
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3.6 Electrical networks and spanning trees 

The order in which topics have been presented so far makes sense from an expository 
viewpoint but is historically backwards. The first interest in enumerating spanning trees 
came from problems in electrical network theory. To set the record straight and also to 
close the circle of ideas 

spanning trees — > random walks — ■> electrical networks — > spanning trees 
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I will spend a couple of paragraphs on this remaining connection. 

Let G be a finite weighted graph. Assume there are no voltage sources and the 
quantity of interest is the effective resistance between two vertices a and b. This is 
defined to be the voltage it is necessary to place across a and b to induce a unit current 
flow. A classical theorem known to Kirchoff is: 

Theorem 3.7 Say s is an a,b-spanning bitree if s is a spanning forest with two compo- 
nents, one containing a and the other containing b. The effective resistance between a 
and b may be computed from the weighted graph G by taking the quotient N/ D where 

D= £ (iWe)) 

spanning trees t \e€t / 

is the sum of the weights of all spanning trees of G and 

a,ft-spanning bitrees s VeGs / 

is the analogous sum over a, b-spanning bitrees. □ 

To see that how this is implied by Theorem 3.5 and equation (1), imagine adding an 
extra one ohm resistor from a to b. The probability of this edge being chosen in a WST 
on the new graph is by definition given by summing the weights of trees containing the 
new edge and dividing by the total sum of the weights of all spanning trees. Clearly 
D is the sum of the weights of trees not containing the extra edge. But the trees con- 
taining the extra edge are in one-to-one correspondence with a, 6-spanning bitrees (the 
correspondence being to remove the extra edge). The extra edge has weight one, so the 
sum of the weights of trees that do contain the extra edge is N and the probability of 
a WST containing the extra edge is N/(N + D). By equation (1) and Theorem 3.5, 
this must then be the fraction of current flowing directly through the extra edge when a 
battery is placed across a and b. Thinking of the new circuit as consisting of the extra 
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edge in parallel with G, the fractions of the current passing through the two components 
are proportional to the inverses of their resistances, so the ratio of the resistance of the 
extra edge to the rest of the circuit must be D : N. Since the extra edge has resistance 
one, the effective resistance of the rest of the circuit is N/D. 

The next problem of course was to efficiently evaluate the sum of the weights of all 
spanning trees of a weighted graph. The solution to this problem is almost as well known 
and can be found, among other places in [7]. 

Theorem 3.8 (Matrix- Tree Theorem) Let G be a finite, simple, connected, weighted 
graph and define a matrix indexed by the vertices of G by letting M(x,x) = d(x), 
M(x,y) = —w(x~y) if x and y are connected by an edge, and M(x,y) = otherwise. 
Then for any vertex x, the sum of the weights of all spanning trees of G is equal to the 
determinant of the matrix gotten from M by deleting by the row and column corresponding 
to x. 



The matrix M is nothing but a representation of A with respect to the basis {5 X }. 
Recalling that the problem essentially boils down to inverting A, the only other ingredient 
in this theorem is the trick of inverting the action of a singular matrix on an element on 
its range by inverting the largest invertible principal minor of the matrix. Details can be 
found in [7]. □ 



4 Transfer-impedances 



In the last section we saw how to calculate P(e G T) in several ways: by Theorems 3.5 
or 3.7 in general and by equations such as (7) in particularly symmetric cases. By 
repeating the calculations in Theorem 3.5 and 3.7 for contractions and deletions of a 
graph (see Section 2.4), we could then find enough conditional probabilities to determine 
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the probability of any elementary event P(e±, . . . ,e r G T and f\, . . . , f s ^ T). Not only is 
this inefficient, but it fails to apply to the symmetric case of equation (7) since contracting 
or deleting the graph breaks the symmetry The task at hand is to alleviate this problem 
by showing how the data we already know how to get - current flows on G - determine the 
current flows on contractions and deletions of G and thereby determine all the elementary 
probabilities for WST on G. This will culminate in a proof of Theorem 1.2, which 
encapsulates all of the necessary computation into a single determinant. 

4.1 An electrical argument 

To keep notation to a minimum this subsection will only deal with unweighted, D-regular 
graphs. Begin by stating explicitly the data that will be used to determine all other 
probabilities. For oriented edges e — xy and / = zw in a finite connected graph G, 
define the transfer-impedance H(e, f) = (p xy (z) — 4> xy {w) which is equal to the voltage 
difference across /, V(z) — V(w), when one amp of current is supplied to x and drawn 
out at y. We will assume knowledge of H(e, f) for every pair of edges in G (presumably 
via some analog calculation, or in a symmetric case by equation (7) or something similar) 
and show how to derive all other probabilities from these transfer-impedances. 

Note first that H(e, e) is the voltage across e for a unit current flow supplied to one 
end of e and drawn out of the other. This is equal to the current flowing directly along 
e under a unit current flow and is thus P(e G T). The next step is to try a computation 
involving a single contraction. For notation, recall the map p which projects vertices 
and edges of G to vertices and edges of G/f. Fix edges e — xy and f — zw and let 
{V{v) : v G G/f} be the voltages we need to solve for: voltages at vertices of G/f when 
a unit current is supplied to p(x) and drawn out at p(y). As we have seen, this means 
AV(v) = +1, —1 or according to whether v = x, y or neither. Suppose we lift this to 
a function V on the vertices of G by letting V(x) = V(p(x)). Let's calculate the excess 
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AV of V. Each edge of G corresponds to an edge in G/f, so for any v 7^ z,w in G, 
AV(v) = AV(p(v)); this is equal to +1 if v — x, — 1 if v — y and zero otherwise. Since 
p maps both z and w onto the same vertex v * w, we can't tell what the AV is at z or 
w individually, but AV(^) + AV(w) will equal AV(z * w) which will equal +1 if z or w 
coincides with x, —1 if z or w coincides with y and zero otherwise (or if both coincide!). 
The last piece of information we have is that V(z) = V(w). Summarizing, 

(i) AV = S x - 5 y + c(5 z - S w ); 
(U) V(z) = V(w) , 

where c is some unknown constant. To see that this uniquely defines V up to an additive 
constant, note that the difference between any two such functions has excess c(S z — 5 W ) for 
some c, hence by the maximum principle reaches its maximum and minimum on {z,w}; 
on the other hand the values at z and w are equal, so the difference is constant. 

Now it is easy to find V. Recall from equation (5) that satisfies A(f) ab = S a — 5b- 
The function V we are looking for is then <p xy + c<p zw where c is chosen so that 

<frxy(z) + ap zw (z) = 4> xy {w) + afr zw (w). 

In words, V gives the voltages for a battery supplying unit current in at x and out at y 
plus another battery across z and w just strong enough to equalize the voltages at z and w. 
How strong is that? The battery supplying unit current to x and y induces by definition 
a voltage H(xy, zw) across z and w. To counteract that, we need a —H(xy, zw)-vo\t 
battery across z and w. Since supplying one unit of current in at z and out at w produces 
a voltage across z and w of H(zw, zw), the current supplied by the counterbattery must 
be c = —H(x^,zw)/H(zw,zw). We do not need to worry about H(zw,zw) being zero 
since this means that P(/ E T) = so we shouldn't be conditioning 011 / E T. Going 
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back to the original problem, 



P(e £ T | / 6 T) 



V(p(x)) - V(p(y)) 



V(x) - V(y) 



H(xy,xy) + H(zw,xy) 



H(xy, zw) 



H(zw, zw) 



H(xy, xy)H(zw, zw) — H(xy, zw)H(zw, xy) 



H(zw, zw) 

Multiplying this conditional probability by the unconditional probability P(/ 6 T) gives 
the probability of both e and / being in T which may be written as 



Thus P(e, / e T) = det M(e, /) where M is the matrix of values of H as in Theorem 1.2. 

Theorem 1.2 has in fact now been proved for r = 1,2. The procedure for general r 
will be similar. Write P(ei, . . . e r e T) as a product of conditional probabilities P(ej G 
T | ej+i, . . . , e r E T). Then evaluate this conditional probability by solving for voltages 
on G/ej+i • • • e r . This is done by placing batteries across ei,...,e r so as to equalize 
voltages across all e^+i, . . . ,e r simultaneously. Although in the r = 2 case it was not 
necessary to worry about dividing by zero, this problem does come up in the general case 
which causes an extra step in the proof. We will now summarily generalize the above 
discussion on how to solve for voltages on contractions of a graph and then forget about 
electricity altogether. 

Lemma 4.1 Let G be a finite D -regular connected graph and let f±, . . . , f r and e = xy be 
edges of G that form no cycle. Let p be the map from G to Gj f± . . . f r that maps edges to 
corresponding edges and maps vertices of G to their equivalence classes under the relation 



P(e,/GT) 



H(x~y, xy) H(x~y, zw) 
H(zw,x~y) H(zw,zw) 
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of being connected by edges in {fi, . . . , f r }. Let V be a function on the vertices of G such 
that 



(i) If zw = fi for some i then V(z) = V(w) ; 



(n) E 



AV(z) = +1 if p(x) = v, — 1 if p(y) = v and zero otherwise. 



If T is a uniform spanning tree for G then P(e e T | f ± , . . . , f r e T) = V(x) — V(y). 

Proof: As before, we know that P(e G T | f 1 , . . . , f r e T) is given by V(p(x)) — V(p(y)) 
where V is the voltage function on G/ f\ ■ ■ ■ f r for a unit current supplied in at x and out 
at y. Defining V(v) to be V(p(v)), the lemma will be proved if we can show that V is 
the unique function on the vertices of G satisfying (?) and (ii). Seeing that V satisfies (i) 
and (ii) is the same as before. Since p provides a one to one correspondence between 
edges of G and edges of Gj f\, . . . , f r , the excess of V at vertices of p^(v) is the sum 
over edges leading out of vertices in p _1 (f) of the difference of V across that edge, which 
is the sum over edges leading out of p(v) of the difference of V across that edge; this is 
the excess of V at p(v) which is = 1, —1 or according to whether x or y or neither is in 



Uniqueness is also easy. If W is any function satisfying (i), define a function W on 
the vertices of Gj f \ ■ ■ ■ f r by W(p(v)) = W(v). If W satisfies (ii) as well then it is easy 



4.2 Proof of the transfer-impedance theorem 

First of all, though is is true that the function H in the previous subsection and the 
statement of the theorem is symmetric, I'm not going to include a proof - nothing else 




to check that W satisfies AW = 5 P ^ - 5 P ^ so that W — V and W = V. 



□ 
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we talk about relies on symmetry of H and a proof may be found in any standard 
treatment of the Green's function, such as [16]. Secondly, it is easiest to reduce the 
problem to the case of .D-regular graphs immediately so as to be able to use the previous 
lemma. Suppose G is any finite connected graph. Let D be the maximum degree of any 
vertex in G and to any vertex of lesser degree k, add D — k self-edges. The resulting 
graph is D-regular (though not simple) and furthermore it has the same spanning trees 
as G. To prove Theorem 1.2 for finite connected graphs, it therefore suffices to prove the 
theorem for finite, connected, D-regular graphs. Restating what is to be proved: 

Theorem 4.2 Let G be any finite, connected, D-regular graph and let T be a uniform 
random spanning tree of G. Let H(xy,zw) be the voltage induced across zw when one 
amp is supplied from x to y. Then for any e±, . . . , e r e G, 

P(ei, . . . , e r e T) = det M(e u . . . , e r ) 

where M(ei, . . . ,e r ) is the r by r matrix whose entry is Hfa, ej). 

The proof is by induction on r. We have already proved it for r = 1, 2, so now we 
assume it for r — 1 and try to prove it for r. There are two cases. The first possibility is 
that P(ei, . . . , e r _i G T) = 0. This means that no spanning tree of G contains e±, . . . , e r 
which means that these edges contain some cycle. Say the cycle is e„(o), . . . , e n (k-i) 
where there are vertices v(i) for which e n u\ connects v{%) to v{% + 1 mod k). For any 
vertices x,y, <p xy is the unique solution up to an additive constant of A<p xy = 5 X — 5 y . 

Thus A (E-=o 4> v{i)v{i+l mod *)) = which means that £?=o <t> v{i)v{i+l mod *) is constant. 
Then for any xy, 

k-l 

J2 H (e n (i),xy) 

i=0 

k-1 k-1 

= ^2 ^v(i)v(i+i mod fc)( x ) _ X! $v(i)v(i+i mod fe)(^) 

i=0 i=0 
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= 0. 



This says that in the matrix M(ei, . . . , e r ), the rows n(l), . . . , n(k) are linearly depen- 
dent, summing to zero. Then det M(ei, . . . , e r ) — which is certainly the probability of 
ei, . . .,e r G T. 

The second possibility is that P(e l5 . . . , e r _i G T) ^ 0. We can then write 

P(ei,...,e r G T) 

= P(ei, . . . e r _i G T)P(e r G T | e u . . . e r _ x G T) 

= det M(e 1 , . . . , e r _i)P(e r G T | e u . . . e r _i G T) 

by the induction hypothesis. To evaluate the last term we look for a function V satisfying 
the conditions of Lemma 4.1 with e r instead of e and e 1; . . . , e r _i instead of f±, . . . , f r . 
For i < r — 1, let xt and yi denote the vertices connected by e,. For any v G Gje\ • • • e r _i 
and any i < r - 1, E 2€ p-» A ^ i2/i (^) = E (z) - 5 yi (z) which is zero since the 

class p _1 (f ) contains both Xj and yi or else contains neither. The excess of 4> Xr y r summed 
over p~ 1 (f) is just 1 if p(x r ) = v, —1 if p(y r ) = v and zero otherwise. By linearity of 
excess, this implies that the sum of 4> Xr y r with any linear combination of {(f> Xi yi : i < r — 1} 
satisfies (ii) of the lemma. 

Satisfying part (i) is then a matter of choosing the right linear combination, but 
the lovely thing is that we don't have to actually compute it! We do need to know it 
exists and here's the argument for that. The i th row of M(e 1: . . . , e r ) lists the values 
of (t> Xi yA x j) ~ ^xiViiUj) as j runs from 1 to r. Looking for Ci, . . . , c r _i such that (p Xr y r + 
E[=i <f>xiyi is the same on Xj as on yj for j < r — 1 is the same as looking for q for 
which the r th row of M plus the sum of Cj times the i th row of M has zeros for every 
entry except the r th . In other words we want to row-reduce, using the first r — 1 rows 
to clear r — 1 zeros in the last row. There is a unique way to do this precisely when the 
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determinant of the upper r — 1 by r — 1 submatrix is nonzero, which is what we have 
assumed. So these ci, . . . , c r _i exist and V(v) = (/> Xr y r (v) + E[=i 4>x iy X v )- 

The lemma tells us that P(e r G T | e± : . . . , e r _i G T) is V(x r ) — V(y r ). This is just the 
r, r-entry of the row-reduced matrix. Now calculate the determinant of the row-reduced 
matrix in two ways. Firstly, since row-reduction does not change the determinant of a 
matrix, the determinant must still be detM(ei, . . . ,e r ). On the other hand, since the 
last row is all zeros except the last entry, expanding along the last row gives that the 
determinant is the r, r-entry times the determinant of the upper r — 1 by r — 1 submatrix, 
which is just P(e r G T | e±, . . . , e r _i G T) det M(e 1 , . . . , e r _i). Setting these two equal 
gives 

P(e r G T | d, . . . ,e r _i G T) = det M(e u . . . , e r )/ det M(e u . . . , e r _i). 
The induction hypothesis says that 

P(ei, . . . , e r _i G T) = det M(d, . . . e r _i) 

and multiplying the conditional and unconditional probabilities proves the theorem. □ 

4.3 A few computational examples 

It's time to take a break from theorem-proving to see how well the machinery we've 
built actually works. A good place to test it is the graph T 3 , since the calculations have 
essentially been done, and since even T 3 is large enough to prohibit enumeration of the 
spanning trees directly by hand (you can use the Matrix- Tree Theorem with all weights 
one to check that there are 11664 of them). Say we want to know the probability that 
the middle vertex A is connected to B, C and D in a uniform random spanning tree T 
of T 3 . 
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We need then to calculate the transfer-impedance matrix for the edges AB, AC and 
AD. Let's say we orient them all toward A. The symmetry of T3 under translation and 
90° rotation allows us to rely completely on the voltages calculated at the end of 3.5. 
Sliding the picture upwards one square and multiplying the given voltages by 4/9 to 
produce a unit current flow from B to A gives voltages 

5/18 3/18 4/18 
8/18 4/18 
5/18 3/18 4/18 

which gives transfer-impedances H{BA, BA) = 8/18, H{BA, CA) = 3/18 and H(BA, DA) 
4/18. The rest of the values follow by symmetry, giving 



/ 



M(BA,CA,DA) = ± 



8 3 

3 8 

4 3 



4\ 

3 

8 ) 



Applying Theorem 4.2 gives P(BA, CA, DA G T) = det M(BA, CA, DA) 



312 

5832 ; 



or m 



other words just 624 of the 11664 spanning trees of T 3 contain all these edges. Compare 
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this to using the Matrix- Tree Theorem to calculate the same probability. That does not 
require the preliminary calculation of the voltages, but it does require an eight by eight 
determinant. 

Suppose we want now to calculate the probability that A is a leaf of T, that is to say 
there is only one edge in T incident to A. By symmetry this edge will be AB 1/4 of the 
time, so we need to calculate P(BA E T and CA, DA, EA T) and then multiply by 
four. As remarked earlier, we can use inclusion-exclusion to get the answer. This would 
entail writing 

P(BA e T and CA, DA, EA £ T) 

= P(BA e T) - P(BA, CA e T) - P(BA, DA e T) - P(BA, EA e T) 
+P(BA, CA, DA e T) + P(BA, CA, EA e T) + P(BA, DA, EA e T) 
-P(BA, CA, DA, EA e T). 

This is barely manageable for four edges, and gets exponentially messier as we want 
to know about probabilities involving more edges. Here is an easy but useful theorem 
telling how to calculate the probability of a general cylinder event, namely the event that 
e\, . . . , e r are in the tree, while /i, . . . , f s are not in the tree. 

Theorem 4.3 Let M(e±, . . . ,e k ) be an k by k transfer-impedance matrix. Let be the 
matrix for which [i, j) = M(i,j) ifi < r and M^ r \i,j) = 1 — M(i,j) ifr + 1 <i<k. 
Then P(ei, . . . , e r G T and e r+ i, . . . , e& i T) = detMW. 

Proof: The proof is by induction on k — r. The initial step is when r = k; then = M 
so the theorem reduces to Theorem 4.2. Now suppose the theorem to be true for k — r = s 
and let k — r — s + 1. Write 

P(ei, . . . ,e r eT and e r+1 , ...,e k £T) 
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= P(ei, . . . , e r e T and e r+2 , . . . , e k £ T) 
-P(ei, . . . , e r+ i G T and e r+2 , . . . , e k <£ T) 



= det M(ei, . . . , e r , e r+2 , . . . e fc ) - det M(d, . . . , e r+ i, e r+2 , . . . e k ), 

since the induction hypothesis applies to both of the last two probabilities. Call these 
last two matrices Mi and M 2 . The trick now is to stick an extra row and column into 
Mi. let M' be M(ei, . . . ,e + k) with the r + I s * row replaced by zeros except for a one in 
the r + 1 st position. Then M' is Mi with an extra row and column inserted. Expanding 
along the extra row gives det M 1 = det Mi. But M 1 and M 2 differ only in the r + I s * row, 
so by multilinearity of the determinant, 

det Mi - det M 2 = det M' - detM 2 = det M" 

where M" agrees with M' and M 2 except that the r + 1 st row is the difference of the 
r + I st rows of M' and M 2 . The induction is done as soon as you realize that M" is just 
MW. □ 

Applying this to the probability of A being a leaf of T 3 , we write 

P(BA e T and CA, DA, EA $ T) 

= det M (3) (B A, C A, DA, E A) 
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3/18 


4/18 


3/18 
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so A is a leaf of 4 • 1176 = 4704 of the 11664 spanning trees of T3. This time, the 
Matrix- Tree Theorem would have required evaluation of several different eight by eight 
determinants. If T 3 were replaced by T n , the transfer-impedance calculation would not 
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be significantly harder, but the Matrix- Tree Theorem would require several n 2 by n 2 
determinants. If n goes to oo, as it might when calculating some sort of limit behavior, 
these large determinants would not be tractable. 



5 Poisson limits 



As mentioned in the introduction, the random degree of a vertex in a uniform spanning 
tree of G converges in distribution to one plus a Poisson(l) random variable as G gets 
larger and more highly connected. This section investigates some such limits, beginning 
with an example symmetric enough to compute explicitly. The reason for this limit may 
seem clearer at the end of the section when we discuss a stronger limit theorem. Proofs 
in this section are mostly sketched since the details occupy many pages in [6]. 



5.1 The degree of a vertex in K n 

The simplest situation in which to look for a Poisson limit is on the complete graph K, 
This is pictured here for n = 8. 
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figure 9 



Calculating the voltages for a complete graph is particularly easy because of all the 
symmetry. Say the vertices of K n are called v i, . . . , v n , and put a one volt battery across 
v 1 and i> 2 , so V(vi) = 1 and V(v 2 ) = 0. By Theorem 3.4, the voltage at any other vertex 
Vj is equal to the probability that SRW^ n hits v\ before v 2 - This is clearly equal to 1/2. 
The total current flow out of v i with these voltages is n/2, since one amp flows along the 
edge to t> 2 and 1/2 amp flows along each of the n — 2 other edges out of v^. Multiplying 
by 2/n to get a unit current flow gives voltages 

f 2/n : i = l 
V(vi) = 1 : i = 2 

\ 1/n otherwise. 

The calculations will of course come out similarly for a unit current flow supplied across 
any other edge of K n . 

The first distribution we are going to examine is of the degree in T of a vertex, say 
v\. Since we are interested in which of the edges incident to v\ are in T, we need to 



51 



calculate H(v\Vi,viVj) for every i,j ^ 1. Orienting all of these edges away from v\ and 
using the voltages we just worked out gives 

. . f 2/n : i = j 

H(v lVi , v lVj ) = < 

[ \jn otherwise 

Denoting the edge from v\ to Vi by ej, we have the n — 1 by n — 1 matrices 



/ 1 



M(e 2 , • • • ,e n ) 



n 



I \ 

n 



/ 2^2 ^1 ... -L \ 



M("- 1 )(e 2 ,...,e n ) 



II... 2 I I =1 =1 . . . w ~ 2 

\ n n n / \ n n n / 

There must be at least one edge in T incident to v\ so Theorem 4.3 says det M^ n ~^ = 

P(e2, • • • , e n ^ T) = 0. This is easy to verify: the rows sums to zero. We can use M*™ -1 ) 
to calculate the probability that e 2 is the only edge in T incident to v\ by noting that 
this happens if and only if e$, . . . , e n ^ T. This is the determinant of M^ n ~ 2 \ezi ■ ■ ■ ■> e n) 
which is a matrix smaller by one thatn M^ n ~ 1 \e2, • • • , e n ) but which still has (n — 2)/n's 
down the diagonal and — 1/n's elsewhere. This is a special case of a circulant matrix, 
which is a type of matrix whose determinant is fairly easy to calculate. 

A k by k circulant matrix is an M for which M(i,j) is some number a{i—j) depending 
only on i — j mod k. Thus M has a all down the diagonal for some a , ai on the next 
diagonal, and so forth. The eigenvalues of a circulant matrix Ao, . . . , \k-i are given by 
Xj = Y,t=o a t( jt where ( = e 2m l n is the n th root of unity. It is easy to verify that these 
are the eigenvalues, by checking that the vector w for which w t = is an eigenvector 
for M (no matter what the are) and has eigenvalue Xj. The determinant is then the 
product of the eigenvalues. Details of this may be found in [17]. 
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In the case of M*™ 2 \ ao = (n — 2)/n and aj = —l/n for j ^ 0. Then Ao = J2j a j — 
1/n. To calculate the other eigenvalues note that for any j ^ mod n — 2, J27=i C J * = 0- 
Then \j = (n - 2) /nEn(-l/n)( j ' = (ra - l)/n - (l/n) ££T 3 C* j = (n - l)/ra. This 
gives 

j=o n \ n J ne 
as n — > oo. 2 Part of the Poisson limit has emerged: the probability that v\ has degree 
one in T is (by symmetry) n — 1 times the probability that the particular edge e<i is the 
only edge in T incident to v\\ this is (n — l)(l + o(l))/en so it converges to e _1 as n — > oo. 
This is P(X = 1) where X is one plus a Poisson(l) , i.e. a Poisson of mean one. 

Each further part of the Poisson limit requires a more careful evaluation of the limit. 
To illustrate, we carry out the second step. Use one more degree of precision in the 
Taylor series for ln(x) and exp(x) to get 

= n- 1 exp[(n - 3)(-n" 1 - n~ 2 (l/2 + o(l)))] 
= n _1 exp[-l + (5/2 + o(l))n- 1 ] 

= n" 1 e" 1 [l + (5/2 + o(l))n" 1 ]. 

The reason we need this precision is that we are going to calculate the probability of Vi 

having degree 2 by summing the P(e, / are the only edges incident to v\ in T) over all 

pairs of edges e, / coming out of v\. By symmetry this is just (n — l)(n — 2)/2 times 

the probability that the particular edges €2 and e% are the only edges in T incident to 

v\. This probability is the determinant of a matrix which is not a circulant, and to avoid 

calculating a difficult determinant it is better to write this probability as the following 

2 Here, o(l) signifies a quantity going to zero asm oo. This is a convenient and standard notation 
that allows manipulation such as (2 + o(l))(3 + o(l)) = 6 + o(l). 
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difference: the probability that no edges other that e 2 and are incident to v\ minus 
the probability that e 2 is the only edge incident to v\ minus the probability that e 2 is 
the only edge incident to 1*3. Since the final probability is this difference multiplied by 
(n — l)(n — 2)/2, the difference should be of order n~ 2 , which explains why this degree 
of precision is required for the latter two probabilities. 

The probability of T containing no edges incident to V\ other than e 2 and e 3 is the 
determinant of M ( - n ~ 3 \e4, . . . ,e n ), which is an n — 3 by n — 3 circulant again having 
(n-2)/n on the diagonal and —\jn elsewhere. Then Ao = X^=o a j = 2/n and Xj = 
(n — l)/n for j 7^ mod n — 3, yielding 

det M (n " 3) = 2n x (^^)" ^ = 2n~ 1 e- 1 [l + (7/2 + o(l))n- 1 ] 

in the same manner as before. Subtracting off the probabilities of e 2 or being the only 
edge in T incident to v 1 gives 

P(e 2 ,e 3 G T,e 4 ,...,e n i T) 

= 2n- l e- l [l + (7/2 + o(l))^ 1 ] - 2 n - l e- l [l + (5/2 + o^))^ 1 ] = (2 + o(l))n~ 2 e- 1 . 

Multiplying by (n — l)(n — 2)/2 gives 

P(t>i has degree 2 in T) — > e _1 

as n — > 00, which is P(X = 2) where X is one plus a Poisson(l). 

5.2 Another point of view 

The calculations of the last section may be continued ad infinitum, but each step requires 
a more careful estimate so it pays to look for a way to do all the steps at once. The right 
alternative method will be more readily apparent if we generalize to graphs other than 
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K n which do not admit such a precise calculation (if a tool that is difficult to use breaks, 
you may discover a better one). 

The important feature about K n was that the voltages were easy to calculate. There 
is a large class of graphs for which the voltages are just as easy to calculate approximately. 
The term "approximately" can be made more rigorous by considering sequences of graphs 
G n and stating approximations in terms of limits as n — > oo. Since I've always wanted to 
name a technical term after my dog, call a sequence of graphs G n Gino-regular if there 
is a sequence D n such that 

(i) The maximum and minimum degree of a vertex in G n are (1 +o(l))D n as 
n — > oo; and 

(ii) The maximum and minimum over vertices x ^ y, z of G n of the proba- 
bility that SRW^ n hits y before z are 1/2 + o(l) as n — > oo. 

Condition (ii) implies that D n — > oo, so the graphs G n are growing locally. It is not hard 
to see that the voltage V(z) in a unit current flow across any edge e = x~y of a graph 
G n in a Gino-regular sequence is (1 + o(l))D~ 1 (5 x — S y )(z) uniformly over all choices of 
x,y,z G G n as n — > oo. The complete graphs K n are Gino-regular. So are the n-cubes, 
S n , whose vertex sets are all the n-long sequences of zeros and ones and whose edges 
connect sequences differing in only one place. 
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figure 10 



To see why {B n } is Gino-regular, consider the "worst case" when a; is a neighbor 
of y. There is a small probability that SRW X (1) will equal y, small because this is 
degree^) -1 = (1 + o{l))D~ 1 which is going to zero. There are even smaller probabilities 
of reaching y in the next few steps; in general, unless SRW X hits y in one step, it tends 
to get "lost" and by the time it comes near y or z again it is thoroughly random and 
is equally likely to hit y or z first. In fact Gino-regular sequences may be thought of as 
graphs that are nearly degree-regular, which SRW gets lost quickly. 

The approximate voltages give approximate transfer-impedances H (e, /) = (2 + 
o(l))/n if e = /, (1 + o(l))/n if e and / meet at a single vertex (choose orientations 
away from the vertex) and o(l)/n if e and / do not meet. The determinant of a matrix is 
continuous in its entries, so it may seem that we have everything necessary to calculate 
limiting probabilities as limits of determinants of transfer-impedance matrices. If v is 
a vertex in Gk and e±, . . . , e n are the edges incident to v in Gk (so n ~ Dk), then the 
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probability of e 2 being the only edge in T incident to v is the determinant of 

( (n-2 + o(l))/n (-l + o(l))/n ••• (-1 + o(l))/ra \ 
(-l + o(l))/n (n - 2 + o(l))/ra ••• (-1 + o(l))/n 

M("- 1 )(e 2 ,...,e„) = 



\ (-l + o(l))/n (-l + o(l))/n ••• (n-2 + o(l))/n y 

Unfortunately, the matrix is changing size as n — > oo, so convergence of each entry to a 
known limit does not give us the limit of the determinant. 

If the matrix were staying the same size, the problem would disappear. This means we 
can successfully take the limit of probabilities of events as long as they involve a bounded 
number of edges. Thus for any fixed edge ei, P(ei G T) = det M(ei) = (1 + o(l))(2/n). 
For any fixed pair of edges e\ and e 2 incident to the same vertex, 



P(ei,e 2 e T) 



det M(ei,e 2 ) 



(2 + o(l))/n (l + o(l))/n 
(l + o(l))/n (2 + o(l))/n 



= (3 + o(l))n" 



In general if 6j ; . . . , 6^ £1X6 all incident to i> then the transfer-impedance matrix is n 
times an r by r matrix converging to the matrix with 2 down the diagonal and 1 elsewhere. 
The eigenvalues of this circulant are Aq = r + 1 and Xj = 1 for j ^ 0, yielding 



P(ei,...,e r G T) = (r + l + o(l))n" 



What can we do with these probabilities? Inclusion-exclusion fails for the same reason 
as the large determinants fail - the o(l) errors pile up. On the other hand, these prob- 
abilities determine certain expectations. Write ei, . . . , e n again for the edges adjacent to 
v and Ii for the indicator function which is one when G T and zero otherwise; then 

£P(e, G T) = ^EJ, = E^J, = Edeg(t;). 

i i i 
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This tells us that Edeg(f) = n{2 + o(l))n 1 = 2 + o(l). If try this with ordered pairs of 
edges, we get 



This last quantity is the sum of all distinct ordered pairs of edges incident to v of the 
quantity: 1 if they are both in the tree and otherwise. If deg(v) = r then a one occurs 
in this sum r(r — 1) times, so the sum is deg(i>)(deg(i>) — 1). The determinant calculation 
gave P(ej, ej G T) = (3 + o(l))n~ 2 for each so 

E[deg(t;)(deg(t;) - 1)] = n(n - 1)(3 + o{l))rT 2 = 3 + o(l). 

In general, using ordered r-tuples of distinct edges gives 

E[deg(f )(deg(f ) — 1) ■ • • (deg(f) — r + 1)] 
= n(n - 1) • • • (n - r + l)(r + 1 + o(l))n" r 
= r + l+o(l). 

Use the notation (A) r to denote A(A — 1) • • • (A — r + 1) which is called the r th lower 
factorial of A. If Y n is the random variable deg(i>) then we have succinctly, 



E(F n ) r is called the r th factorial moment of Y n . 

If you remember why we are doing these calculations, you have probably guessed that 
E(X) r = r + 1 when X is one plus a Poisson(l). This is indeed true and can be seen 
easily enough from the logarithmic moment generating function Et x via the identity 



using Et x = Ee xln(t) = 0(ln(t)) = te^ 1 ; consult [14, page 301] for details. All that we 
need now for a Poisson limit result is a theorem saying that if the factorial moments of 



E p ( e - e, e T) = £ EIJj = E ]T lily 




E(y n ) r = r + l + o(l). 



8 
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Y n are each converging to the factorial moments of X, then Y n is actually converging in 
distribution to X. This is worth spending a short subsection on because it is algebraically 
very neat. 

5.3 The method of moments 

A standard piece of real analysis shows that if all the factorial moments of a sequence 
of random variables converging to a limit are finite, then for each r, the limit of the 
r th factorial moments is the r th factorial moment of the limit. (This is essentially the 
Lebesgue-dominated convergence theorem.) Another standard result is that if the mo- 
ments of a sequence of random variables converge, then the sequence, or at least some 
subsequence is converging in distribution to some other random variable whose moments 
are the limits of the moments in the sequence. Piecing together these straight-forward 
facts leaves a serious gap in our prospective proof: What if there is some random variable 
Z distributed differently from X with the same factorial moments? If this could happen, 
then there would be no reason to think that Y n converged in distribution to X rather than 
Z. This scenario can actually happen - there really are differently distributed random 
variables with the same moments! (See the discussion of the lognormal distribution in 
[9].) Luckily this only happens when X is badly behaved, and a Poisson plus one is not 
badly behaved. Here then is a proof of the fact that the distribution of X is the only one 
with r th factorial moment r + 1 for all r. I will leave it to you to piece together, look up 
in [9] or take on faith how this fact plus the results from real analysis imply Y —>X. 

Theorem 5.1 Let X be a random variable with E(X) r < e kr for some k. Then no 
random variable distributed differently from X has the same factorial moments. 

Proof: The factorial moments E(X) r determine the regular moments fi r = EX r and 
vice versa by the linear relations (X)i = X 1 ; (X)2 = X 2 — X 1 , etc. From these linear 
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relations it also follows that factorial moments are bounded by some e if and only 
if regular moments are bounded by some e kr , thus it suffices to prove the theorem for 
regular moments. Not only do the moments determine the distribution, it is even possible 
to calculate P(X = j) directly from the moments of X in the following manner. 



The characteristic function of X is the function <fi(t) = Ee ltx where % = \f^\. This 



is determined by the moments since Ee itx = E(l + (itX) + (itX) 2 /2\ H ) = 1 + itfii + 

(it) 2 /X2/2! + • • •. We use the exponential bound on the growth of fi r to deduce that 
this is absolutely convergent for all t (though a somewhat weaker condition would do). 
The growth condition also shows that Ee ltx is bounded and absolutely convergent for 
y G [0, 2n]. Now P(X = j) can be determined by Fourier inversion: 



(switching the sum and integral is OK for bounded, absolutely convergent integrals) 







Lj2P(X = r)5o(r-j) 



r>0 



P(X = J). 



□ 
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5.4 A branching process 

In the last half of section 1.4 I promised to explain how convergence in distribution of 
deg(f) was a special case of convergence of T near v to a distribution called V\. (You 
might want to go back and reread that section before continuing.) The infinite tree V\ is 
interesting in its own right and I'll start making good on the promise by describing V\. 

This begins with a short description of Galton- Watson branching processes. You can 
think of a Galton- Watson process as a family tree for some fictional amoebas. These 
fictional amoebas reproduce by splitting into any number of smaller amoebas (unlike 
real amoebas that can only split into two parts at a time). At time t = there is just 
a single amoeba, and at each time t = 1,2,3, .. ., each living amoeba A splits into a 
random number N = N t (A) of amoebas, where the random numbers are independent 
and all have the same distribution P(N t (A) = j) = Pj- Allow the possibility that N = 
(the amoeba died) or that N = 1 (the amoeba didn't do anything). Let fi = J2jJPj be 
the mean number of amoebas produced in a split. A standard result from the theory 
of branching processes [4] is that if // > 1 then there is a positive probability that the 
family tree will survive forever, the population exploding exponentially as in the usual 
Malthusian forecasts for human population in the twenty-first century. Conversely when 
/i < 1, the amoeba population dies out with probability 1 and in fact the chance of 
it surviving n generations decreases exponentially with n. When /z = 1 the branching 
process is said to be critical. It must still die out, but the probability of it surviving 
n generations decays more slowly, like a constant times 1/n. The theory of branching 
processes is quite large and you can find more details in [4] or [10]. 

Specialize now to the case where the random number of offspring has a Poisson(l) 
distribution, i.e. pj = e' 1 / j\. Here's the motivation for considering this case. Imagine a 
graph G in which each vertex has N neighbors and N is so large it is virtually infinite. 
Choose a subgraph U by letting each edge be included independently with probability 
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N^ 1 . Fix a vertex v G G and look at the vertices connected to v in U. The number 
of neighbors of v in U has a Poisson(l) distribution by the standard characterization 
of a Poisson as the limit of number of occurrences of rare events. For each neighbor y 
of v in U, there are N — 1 edges out of y other than the one to v, and the number of 
those in U will again be Poisson(l) (since N m oo, subtracting one does not matter) and 
continuing this way shows that the connected component of v in U is distributed as a 
Galton- Watson process with Poisson(l) offspring. 

Of course U is not distributed like a uniform spanning tree T. For one thing, U may 
with probability e _1 fail to have any edges out of v. Even if this doesn't happen, the 
chance of U having more than n vertices goes to zero as n — > oo (a critical Galton- Watson 
process dies out) whereas T, being a spanning tree of an almost infinite graph, goes on 
as far as the eye can see. The next hope is that T looks like U conditioned not to die 
out. This should in fact seem plausible: you can check that U has no cycles near v since 
virtually all of the N edges out of each neighbor of v lead further away from v; then a 
uniform spanning tree should be a random cycle-free graph U that treats each edge as 
equally likely, conditioned on being connected. 

The conditioning must be done carefully, since the probability of U living forever is 
zero, but it turns out fine if you condition on U living for at least n generations and 
take the limit as n — > oo. The random infinite tree V\ that results is called the incipient 
infinite cluster at v, so named by percolation theorists (people who study connectivity 
properties of random graphs). It turns out there is an alternate description for the 
incipient infinite cluster. Let v = vq, i>i, i>2, . . . be a single line of vertices with edges 
vvi, V1V2, ■ ■ ■■ For each of the vertices Vi independently, make a separate independent 
copy Ui of the critical Poisson(l) branching process U with as the root and paste it 
onto the line already there. Then this collage has the same distribution as V\. This fact 
is the "whole tree" version of the fact that a Poisson(l) conditioned to be nonzero is 
distributed as one plus a Poisson(l) (you can recover this fact from the fact about V\ by 
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looking just at the neighbors of v). 
5.5 Tree moments 

To prove that a uniform spanning tree T n of G n converges in distribution to V\ when 
G n is Gino-regular, we generalize factorial moments to trees. Let t be a finite tree rooted 
at some vertex x and let W be a tree rooted at v. W is allowed to be infinite but 
it must be locally finite - only finitely many edges incident to any vertex. Say that 
a map / from the vertices of t to the vertices of W is a tree-map if / is one to one, 
maps x to v and neighbors to neighbors. Let N(W; t) count the number of tree-maps 
from t into W. For example in the following picture, N(W; t) = 4, since C and D 
can map to H and I in either order with A mapping to E, and B can map to F or G. 

t W 




figure 11 
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Define the t th tree-moment of a random tree Z rooted at v to be EN(Z; t). If t is an 
n-star, meaning a tree consisting of n edges all emanating from x, then a tree-map from 
t to W is just a choice of n distinct neighbors of v in order, so N(W;t) = (deg(i>))„. 
Thus EiN(Z;t) = E(deg(t>))„, the n th factorial moment of deg(f). This is to show you 
that tree- moments generalize factorial moments. Now let's see what the tree-moments 
of V\ are. Let t be any finite tree and let \t\ denote the number of vertices in t. 

Lemma 5.2 Let U be a Galton-Wastson process rooted at v with Poisson(l) offspring. 
Then EN(U; t) — 1 for all finite trees t. 

Proof: Use induction on t, the lemma being clear when t is a single vertex. The way 
the induction step works for trees is to show that if a fact is true for a collection of trees 
ti, . . . , t n then it is true for the tree t* consisting of a root x with n neighbors x±, . . . ,x n 
having subtrees ti, . . . t n respectively as in the following illustration. 




figure 12 

So let t±, . . . ,t n and t* be as above. Any tree- map / : t* — > U must map the n 
neighbors of v into distinct neighbors of U and the expected number of ways to do this 
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is E(deg(f)) n which is one for all n since deg(i>) is a Poisson(l) [9]. Now for any such 
assignment of / on the neighbors of v, the number of ways of completing the assignment 
to a tree- map is the product over % — 1, . . .n of the number of ways of mapping each 
ti into the subtree of U below f{x,j). After conditioning on what the first generation 
of U looks like, the subtrees below any neighbors of v are independent and themselves 
Galton- Watsons with Poisson(l) offspring. (This is what it means to be Galton- Watson.) 
By induction then, the expected number of ways of completing the assignment of / is the 
product of a bunch of ones and is therefore one. Thus ~EN(U ; t) = E(deg(i>)) n 117=1 1 = 1- 
□ 

Back to calculating EN(Vi,t). Recall that V\ is a line Vq,Vi,... with Poisson(l) 
branching processes Ui stapled on. Each tree-map / : t — > Vi hits some initial segment 
Vq, . . . Vk of the original line, so there is some vertex yf e t such that f{yf) = Vk for some 
k but is not in the image of /. For each y G t, we count the expected number of 
tree-maps / for which yf = y. There is a path x = / _1 (fo), • • • , = y in t going 

from the root x to y. The remaining vertices of t can be separated into k + 1 subtrees 
below each of the These subtrees must then get mapped respectively into the 

Ui. By the lemma, the expected number of ways of mapping anything into a Ui is one, 
so the expected number of / for which yf = yis Y\^ =1 1 = 1. Summing over y then gives 

EN{Vi,t) = \t\ (9) 

The last thing we are going to do to in proving the stronger Poisson convergence 
theorem is to show 

Lemma 5.3 Let G n be a Gino-regular sequence of graphs, and let T n be a uniform 
spanning tree of G n rooted at some v n . Then for any finite rooted tree t, E7V(T n ; t) — > \t\ 
as n — > oo. 

It is not trivial from here to establish that T A r converges in distribution to V\ A r for 



65 



every r. The standard real analysis facts I quoted in section 5.3 about moments need to 
be replace by some not-so-standard (but not too hard) facts about tree-moments. Suffice 
it to say that the previous two lemmas do in the end prove (see [6] for details) 

Theorem 5.4 Let G n be a Gino-regular sequence of graphs, and let T n be a uniform 
spanning tree of G n rooted at some v n . Then for any r , T n A r converges in distribution 
toV\f\r as n — > oo . 

Sketch of proof of Lemma 5.3: Fix a finite t rooted at x. To calculate the expected 
number of tree-maps from t into T n we will sum over every possible image of a tree-map 
the probability that all of those edges are actually present in T n . By an image of a 
tree-map, I mean two things: (1) a collection {v x : x G t} of vertices of G n indexed by 
the vertices of t for which v x ~ v y in G whenever x ~ y in t; (2) a collection of edges e e 
connecting v x and v y for every edge e G t connecting some x and y. Fix such an image. 

The transfer-impedance theorem tells us that the probability of finding all the edges 
v e in T is the determinant of M(e € : e G t). Now for edges e, e' G G, Gino-regularity gives 
that H(e, e') = D~ 1 {o{l) + k) uniformly over edges of G n , where k is 2, 1 or according 
to whether e = e', they share an endpoint, or they are disjoint. The determinant is then 
well approximated by the corresponding determinant without the o(l) terms, which can 
be worked out as exactly \t\D]~\ t]{ . 

This must now be summed over all possible images, which amounts to multiplying 
\t\D]^^ by the number of possible images. I claim the number of possible images is 
approximately D^~ l . To see this, imagine starting at the root x, which must get mapped 
to v n , and choosing successively where to map each nest vertex of t. Since there are 
approximately D n edges coming out of each vertex of G n , there are always about D n 
choices for the image of the next vertex (the fact that you are not allowed to choose 
any vertex already chosen is insignificant as D n gets large). There are \t\ — 1 choices, so 
the number of maps is about Djfl- 1 . This 

proves the claim. The claim implies that the 
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expected number of tree-maps from t to T n is {tlD^ D^ 1 = \t\, proving the lemma. 
□ 

6 Infinite lattices, dimers and entropy 

There is, believe it or not, another model that ends up being equivalent to the uniform 
spanning tree model under a correspondence at least as surprising as the correspondence 
between spanning trees and random walks. This is the so-called dimer or domino tiling 
model, which was studied by statistical physicists quite independently of the uniform 
spanning tree model. The present section is intended to show how one of the fundamental 
questions of this model, namely calculating its entropy, can be solved using what we know 
about spanning trees. Since it's getting late, there will be pictures but no detailed proofs. 

6.1 Dimers 

A dimer is a substance that on the molecular level is made up of two smaller groups 
of atoms (imagine two spheres of matter) adhering to each other via a covalent bond; 
consequently it is shaped like a dumbbell. If a bunch of dimer molecules are packed 
together in a cold room and a few of the less significant laws of physics are ignored, the 
molecules should array themselves into some sort of regular lattice, fitting together as 
snugly as dumbbells can. To model this, let r be some positive real number representing 
the length of one of the dumbbells. Let L be a lattice, i.e. a regular array of points in 
three-space, for which each point in L has some neighbors at distance r. For example r 
could be 1 and L could be the standard integer lattice {(x, y, z) : x,y,z G 2Z}, so r is the 
minimum distance between any two points of L (see the picture below). Alternatively r 
could be \pl or \[Z for the same L. Make a graph G whose vertices are the points of L, 
with an edge between any pair of points at distance r from each other. Then the possible 
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packings of dimers in the lattice are just the ways of partitioning the lattice into pairs 
of vertices, each pair (representing one molecule) being the two enpoints of some edge. 
The following picture shows part of a packing of the integer lattice with nearest-neighbor 
edges. 




figure 13 

Take a large finite box inside the lattice, containing N vertices. If N is even and the 
box is not an awkward shape, there will be not only one but many ways to pack it with 
dimers. There will be several edges incident to each vertex v, representing a choice to be 
made as to which other vertex will be covered by the molecule with one atom covering v. 
These choices obviously cannot be made independently, but it should be plausible from 
this that the total number of configurations is approximately 7^ for some 7 > 1 as iV goes 
to infinity. This number can be written alternatively as e hN where h = In (7) is called the 
entropy of the packing problem. The thermodynamics of the resulting substance depend 
on, among other things, the entropy h. 

The case that has been studied the most is where L is the two-dimensional integer 
lattice with r — 1. The graph G is then the usual nearest-neighbor square lattice. Phys- 
ically this corresponds to packing the dimers between two slides. You can get the same 
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packing problem by attempting to tile the plane with dominos - vertical and horizontal 
1 by 2 rectangles - which is why the model also goes by the name of domino tiling. 

6.2 Dominos and spanning trees 

We have not yet talked about spanning trees of an infinite graph, but the definition 
remains the same: a connected subgraph touching each vertex and contaning no cycles. If 
the subgraph need not be connected, it is a spanning forest. Define an essential spanning 
forest or ESF to be a spanning forest that has no finite components. Informally, an ESF 
is a subgraph that you can't distinguish from a spanning tree by only looking at a finite 
part of it (since it has no cycles or islands). 

Let G 2 denote the nearest-neighbor graph on the two dimensional integer lattice. 
Since G 2 is a planar graph, it has a dual graph G 2 , which has a vertex in each cell of 
G 2 and an edge e* crossing each edge e of G 2 . In the following picture, filled circles 
and heavy lines denote G 2 and open circles and dotted lines denote G 2 . Note that G 2 , 
together with G2 and the points where edges cross dual edges, forms another graph G 2 
that is just G 2 scaled down by a factor of two. 
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figure 14 

Each subgraph H of G has a dual subgraph H* consisting of all edges e* of G* dual 
to edges e not in if. If H has a cycle, then the duals of all edges in the cycle are 
absent from H* which separates H* into two components: the interior and exterior of 
the cycle. Similarly an island in H corresponds to a cycle in H* as in the picture: 
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figure 15 

From this description, it is clear that T is an essential spanning forest of G2 if and only 
if T* is an essential spanning forest of Gvj = G 2 - 
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Let T now an infinite tree. We define directed a little differently than in the finite 
case: say T is directed if the edges are oriented so that every vertex has precisely one 
edge leading out of it. Following the arrows from any vertex gives an infinite path and 
it is not hard to check that any two such paths from different vertices eventually merge. 
Thus directedness for infinite trees is like directedness for finite trees, toward a vertex at 
infinity. 

Say an essential spanning forest of G2 is directed if a direction has been chosen for each 
of its components and each of the components of its dual. Here then is the connection 
between dominos and essential spanning forests. 

Let T be a directed essential spanning forest of G2, with dual T*. Con- 
struct a domino tiling of G2 as follows. Each vertex v G V{G2) Q V{G2) is 
covered by a domino that also covers the vertex of G2 in the middle of the 
edge of T that leads out of v. Similarly, each vertex v* G V{G^} is covered 
by a domino also covering the middle of the edge of T* leading out of v. It is 
easy to check that this gives a legitimate domino tiling: every domino covers 
two neighboring vertices, and each vertex is covered by precisely one domino. 

Conversely, for any domino tiling of G 2 , directed essential spanning forests 
T and T* for G 2 and G* 2 can be constructed as follows. For each v G V(G 2 ), 
the oriented edge leading out of v in T is the one along which the domino 
covering v lies (i.e. the one whose midpoint is the other vertex of G2 covered 
by the domino covering v). Construct T* analogously. To show that T and 
T* are directed ESF's amounts to showing there are no cycles, since clearly 
T and T* will have one edge coming out of each vertex. This is true because 
if you set up dominos in such a way as to create a cycle, they will always 
enclose an odd number of vertices (check it yourself!). Then there is no way 
to extend this configuration to a legitimate domino tiling of G 2 - 
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It is easy to see that the two operations above invert each other, giving a one to 
one correspondence between domino tilings of G 2 and directed essential spanning forests 
of G 2 - To bring this back into the realm of finite graphs requires ironing out some 
technicalities which I am instead going to ignore. The basic idea is that domino tilings 
of the 2n-torus T 2n correspond to spanning trees of T n almost as well as domino tilings of 
G 2 correspond to spanning trees of G 2 . Going from directed essential spanning forests to 
spanning trees is one of the details glossed over here, but explained somewhat in the next 
subsection. The entropy for domino tilings is then one quarter the entropy for spanning 
trees, since T 2n has four times as many vertices as T n . Entropy for spanning trees just 
means the number h for which T n has approximately e hn2 spanning trees. To calculate 
this, we use the matrix-tree theorem. 

The number of spanning trees of T n according to this theorem is the determinant of 
a minor of the matrix indexed by vertices of T n whose i>,u>-entry is 4 if v — w, —1 if 
v ~ w and otherwise. If T n were replaced by n edges in a circle, then this would be a 
circulant matrix. As is, it is a generalized circulant, with symmetry group T n = {"K/riK) 2 
instead of ZjnTL. The eigenvalues can be gotten via group representations of T n , resulting 
in eigenvalues 4 — 2 cos(27r/c/n) — 2cos(27r//n) as k and I range from to n — 1. The 
determinant we want is the product of all of these except for the zero eigenvalue at 
k — I — 0. The log of the determinant divided by n 2 is the average of these as k and / 
vary, and the entropy is the limit of this as n — > 00 which is given by 



6.3 Miscellany 

The limit theorems in Section 5 involved letting G n tend to infinity locally, in the sense 
that each vertex in G n had higher degree as n grew larger. Instead, one may consider a 
sequence such as G n = T n ; clearly the n-torus converges in some sense to G 2 as n — > 00, 
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so there ought to be some limit theorem. Let T n be a uniform spanning tree of G n . Since 
G n is not Gino-regular, the limit may not be V\ and in fact cannot be since the limit has 
degree bounded by four. It turns out that T n converges in distribution to a random tree 
T called the uniform random spanning tree for the integer lattice. This works also for 
any sequence of graphs converging to the three or four dimensional integer lattices [13]. 
Unfortunately the process breaks down in dimensions five and higher. There the uniform 
spanning spanning trees on G n do converge to a limiting distribution but instead of a 
spanning tree of the lattice, you get an essential spanning forest that has infinitely many 
components. If you can't see how the limit of spanning trees could be a spanning forest, 
remember that an essential spanning forest is so similar to a spanning tree that you can't 
tell them apart with any finite amount of information. 

Another result from this study is that in dimensions 2, 3 and 4, the uniform random 
spanning tree T has only one path to infinity. What this really means is that any two 
infinite paths must eventually join up. Not only that, but T* has the same property. That 
means there is only one way to direct T, so that each choice of T uniquely determines 
a domino tiling of G2. In this way it makes sense to speak of a uniform random domino 
tiling of the plane: just choose a uniform random spanning tree and see what domino 
tiling it corresponds to. 

That takes care of one of the details glossed over in the previous subsection. It also 
just about wraps up what I wanted to talk about in this article. As a parting note, let me 
mention an open problem. Let G be the infinite nearest neighbor graph on the integer 
lattice in d dimensions and let T be the uniform spanning tree on G gotten by taking 
a distributional limit of uniform spanning trees on <i-dimensional n-tori as n — > 00 as 
explained above. 

Conjecture 2 Suppose d > 5. Then with probability one, each component of the essen- 
tial spanning forest has only one path to infinity, in the sense that any two infinite paths 
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must eventually merge. 
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